Add QNN EP HTP shared memory allocator #23136

edgchen1 · 2024-12-18T01:08:30Z

Description

Adds QNN EP HTP shared memory allocator.

The HTP shared memory allocator (HtpSharedMemoryAllocator) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU.

The allocator can be enabled by setting QNN EP option enable_htp_shared_memory_allocator to 1. QNNExecutionProvider::CreatePreferredAllocators() will then return an instance of HtpSharedMemoryAllocator.

For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to QnnBackendManager, which also manages the QNN context handles.

For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial

Limitations:

HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported.
An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer.

Motivation and Context

Improve performance by using HTP shared memory to avoid overhead from copying data between CPU and NPU.

…test

… declarations and definitions for IAllocator::TensorAlloc().

…ion clean up callback

edgchen1 · 2025-01-09T02:08:02Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

+          // - QNN context handle is still valid. This should be true as long as QNN contexts are not freed from
+          //   anywhere other than the destructor.


This should be true as long as QNN contexts are not freed from anywhere other than the destructor.

it seems kind of brittle to depend on this.

Wouldn't we catch it during development if someone changed the code to free the context somewhere else?

the concern is a race between this clean up function locking weak_context_mem_handle_manager (thus keeping it alive) and the QNN context handle getting freed.

I'm thinking it may be possible to manage the QNN context handle (as well as the context mem handles) in some object and have a weak_ptr to that instead.

HectorSVC · 2025-01-09T06:39:14Z

onnxruntime/test/providers/qnn/qnn_basic_test.cc

@@ -1098,6 +1099,38 @@ TEST_F(QnnHTPBackendTests, EPOffloadsGraphIOQuantDequant) {
  }
 }

+TEST_F(QnnHTPBackendTests, UseHtpSharedMemoryAllocatorForInputs) {
+#if !defined(__ANDROID__) && !defined(_WIN32)


QC device for windows is Arm64 based, so you can check defined(aarch64) defined(_M_ARM64)

this code is within an ifdef that checks for those macros:

onnxruntime/onnxruntime/test/providers/qnn/qnn_basic_test.cc

Line 538 in 425023b

#if defined(__aarch64__) || defined(_M_ARM64) || defined(__linux__)

HectorSVC · 2025-01-09T06:47:10Z

onnxruntime/test/providers/qnn/qnn_basic_test.cc

@@ -1098,6 +1099,38 @@ TEST_F(QnnHTPBackendTests, EPOffloadsGraphIOQuantDequant) {
  }
 }

+TEST_F(QnnHTPBackendTests, UseHtpSharedMemoryAllocatorForInputs) {


We should also have some codes to demonstrate how this feature get used from user code.
Here are some IObinding examples for other EPs:

onnxruntime/onnxruntime/test/shared_lib/test_inference.cc

Line 2076 in 3b1a900

#if defined(USE_CUDA) || defined(USE_TENSORRT)

include/onnxruntime/core/framework/ortmemoryinfo.h

skottmckay · 2025-01-09T11:07:12Z

onnxruntime/core/providers/qnn/qnn_allocator.h

+
+  struct AllocationRecord {
+    SharedMemoryInfo shared_memory_info;
+    InlinedVector<AllocationCleanUpFn, 1> clean_up_fns;


Do we expect more than one cleanup func?

it's not unexpected. e.g., if the same shared memory is used from more than one QNN context, there will be a separate cleanup function per QNN context.

skottmckay · 2025-01-09T11:11:54Z

onnxruntime/core/providers/qnn/qnn_allocator.cc

+    marker.fill('\0');
+    allocator_ptr = nullptr;


Should we limit doing the fill to a debug build? not sure how many allocations QNN makes and whether there's any meaningful perf cost.

I'm a little hesitant to remove it as it did catch some issues during my testing. maybe we can do that later if it is measured to have a significant performance cost? it is only overwriting the 8 marker bytes.

skottmckay · 2025-01-09T11:23:15Z

onnxruntime/core/providers/qnn/qnn_allocator.cc

+
+namespace {
+
+struct AllocationHeader {


Would be great to add a comment describing the overall setup and how it uses this header.

onnxruntime/core/providers/qnn/qnn_allocator.cc

skottmckay · 2025-01-09T11:32:58Z

onnxruntime/core/providers/qnn/qnn_execution_provider.cc

-      htp_arch,
-      soc_model,
-      enable_htp_weight_sharing);
+  static const std::string QNN_HTP_SHARED_MEMORY_ALLOCATOR_ENABLED = "enable_htp_shared_memory_allocator";


Should this be more user visible?

documented in onnxruntime_c_api.h. I can also document it in the gh-pages branch after this PR.

onnxruntime/core/providers/qnn/shared_context.h

edgchen1 · 2025-01-09T18:29:00Z

onnxruntime/core/providers/qnn/shared_context.h

note: moved SharedContext class from qnn_execution_provider.h to its own file.

skottmckay · 2025-01-09T21:35:05Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h

+  // Note: creation should be done via Create()
+  QnnBackendManager(const QnnBackendManagerConfig& config, PrivateConstructorTag)
+      : backend_path_(config.backend_path),


Should this be private if it's not meant to be called directly?

ideally it would be private, but then std::make_shared wouldn't be able to access it

skottmckay · 2025-01-09T21:44:33Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

+          // - QNN context handle is still valid. This should be true as long as QNN contexts are not freed from
+          //   anywhere other than the destructor.


Wouldn't we catch it during development if someone changed the code to free the context somewhere else?

skottmckay · 2025-01-09T21:45:07Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

+          auto backend_manager = weak_backend_manager.lock();
+          if (!backend_manager) {
+            return;
+          }
+
+          auto context_mem_handle_manager = weak_context_mem_handle_manager.lock();
+          if (!context_mem_handle_manager) {
+            return;
+          }
+


Should we log something if either of these are false or is that expected?

the weak_ptrs wouldn't be able to be locked in the case where the backend manager (i.e., the QNN EP) is destroyed before the allocation is freed. currently it should be fine for the allocator to outlive the EP. so it seems like this case is not too unexpected.

skottmckay · 2025-01-09T21:48:03Z

onnxruntime/core/providers/qnn/builder/qnn_context_mem_handle_manager.cc

+Status QnnContextMemHandleManager::GetOrRegister(void* shared_memory_address, const Qnn_Tensor_t& qnn_tensor,
+                                                 Qnn_MemHandle_t& qnn_mem_handle, bool& did_register) {
+  const auto qnn_tensor_rank = GetQnnTensorRank(qnn_tensor);
+  auto* const qnn_tensor_dims = GetQnnTensorDims(qnn_tensor);


Do all QNN tensors have fixed shapes that are guaranteed to be known?

I'm not certain. @HectorSVC or @adrianlizarraga can you comment on this?

skottmckay · 2025-01-09T21:49:09Z

onnxruntime/core/providers/qnn/builder/qnn_context_mem_handle_manager.cc

+    Qnn_MemDescriptor_t mem_descriptor{};
+    mem_descriptor.memShape.dimSize = qnn_tensor_dims;
+    mem_descriptor.memShape.numDim = qnn_tensor_rank;
+    mem_descriptor.memShape.shapeConfig = nullptr;


Out of interest when might shapeConfig be used and what for?

good question.

it is documented here: "Additional configuration in string, for extensibility. Allowed to be NULL."

the example usage here that I was using as a reference sets it to nullptr.

onnxruntime/core/providers/qnn/builder/qnn_model.cc

skottmckay · 2025-01-09T22:22:21Z

onnxruntime/core/providers/qnn/builder/qnn_utils.cc

@@ -63,6 +65,12 @@ size_t GetElementSizeByType(ONNXTensorElementDataType elem_type) {
  return pos->second;
 }

+size_t GetQnnTensorDataSize(gsl::span<const uint32_t> shape, Qnn_DataType_t element_type) {
+  ORT_ENFORCE(!shape.empty(), "Empty shape not allowed.");  // TODO can we just treat empty shape as a scalar?


Can we treat it as a scalar (IIRC we do that in the CoreML EP) or could/should some other place make that adjustment?

Could we potentially get here from a tensor that has an unknown shape (e.g. downstream of a dynamic reshape)? Not sure if those get rejected earlier on in the QNN EP processing.

onnxruntime/core/framework/allocator.cc

onnxruntime/core/providers/qnn/builder/qnn_utils.cc

edgchen1 and others added 30 commits November 5, 2024 15:12

save work

110a3bc

save work

0ba3a2f

add logging for setting QNN tensor memory, update comment

8436b14

add option to enable HTP shared memory allocator to onnxruntime_perf_…

c9826f4

…test

hack - try to cache mem handles in QnnModel

c07c35e

Remove duplicate include.

60dc837

hack, continued - move cache out to SharedContext

24e072f

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

e66cbef

move mem handle registration to allocator

8c515da

hook up some test code

18e2780

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

09ddce5

rename to RpcMemAllocator to HtpSharedMemoryAllocator

a65bb71

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

bfb135e

remove onnx protobuf dependency from allocator.h, add shared provider…

f179a0d

… declarations and definitions for IAllocator::TensorAlloc().

remove unused CPUAllocator::TensorAlloc declaration

7645ef4

Check for nullptr when trying to free

1043732

move mem handle management to QNN backend manager

022f4bc

remove IAllocator::TensorAlloc()

c527dee

document IAllocator::Free

e4f72b3

remove IAllocator__TensorAlloc

39ff901

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

1bed5a4

fix android build warning

d70db84

remove shared mem handles from shared context

45ef883

remove allocation clean up callback removal, use weak_ptrs in allocat…

d2e7b3c

…ion clean up callback

some clean up

c892c18

more clean up

b295eef

add helper to get qnn error message

13f5e30

use make_shared for QnnBackendManager

d5eace1

add test to qnn_basic_test.cc, document allocator parameter.

bacbcdc

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

30cd9ed

edgchen1 requested review from skottmckay, baijumeswani, adrianlizarraga and jywu-msft December 19, 2024 02:38

edgchen1 added 5 commits January 6, 2025 10:21

add onnxruntime_c_api.h include to ortmemoryinfo.h

4a3f6c3

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

65ce4b1

rename GetQnnTensorDataSize to GetQnnTensorDataSizeInBytes

ff12541

add QnnBackendManager::Create function to ensure shared_ptr usage

5e6e103

make some QnnBackendManager member functions private, update comment

78e86cc

edgchen1 marked this pull request as ready for review January 6, 2025 23:14

edgchen1 changed the title ~~[WIP] Add QNN EP HTP shared memory allocator~~ Add QNN EP HTP shared memory allocator Jan 6, 2025

document GetOrRegister functions

e665a2b

HectorSVC added the ep:QNN issues related to QNN exeution provider label Jan 7, 2025

edgchen1 added 2 commits January 7, 2025 17:05

add enable_htp_shared_memory_allocator to available_keys

425023b

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

781a4a0

edgchen1 commented Jan 9, 2025

View reviewed changes

make DlError return const char*

4d29208

HectorSVC reviewed Jan 9, 2025

View reviewed changes

skottmckay reviewed Jan 9, 2025

View reviewed changes

edgchen1 commented Jan 9, 2025

View reviewed changes

edgchen1 added 5 commits January 9, 2025 10:31

Use ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE for SharedContext

568c9a7

use safeint instead of manually checking against int max

8b95535

add/update doc for enable_htp_shared_memory_allocator option

515999c

formatting

6986839

add some comments about HtpSharedmemoryAllocator impl

00b286b

skottmckay reviewed Jan 9, 2025

View reviewed changes

baijumeswani reviewed Jan 9, 2025

View reviewed changes

onnxruntime/core/framework/allocator.cc Show resolved Hide resolved

onnxruntime/core/providers/qnn/builder/qnn_utils.cc Show resolved Hide resolved

edgchen1 added 2 commits January 9, 2025 16:40

initialize with QNN_MEM_DESRIPTOR_INIT

88dec64

address comments

4ca3ea7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QNN EP HTP shared memory allocator #23136

Add QNN EP HTP shared memory allocator #23136

edgchen1 commented Dec 18, 2024 •

edited

Loading

edgchen1 Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

HectorSVC Jan 9, 2025

edgchen1 Jan 9, 2025

HectorSVC Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 9, 2025

edgchen1 Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

skottmckay Jan 9, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

skottmckay Jan 9, 2025

edgchen1 Jan 10, 2025

skottmckay Jan 9, 2025

		// - QNN context handle is still valid. This should be true as long as QNN contexts are not freed from
		// anywhere other than the destructor.

Add QNN EP HTP shared memory allocator #23136

Are you sure you want to change the base?

Add QNN EP HTP shared memory allocator #23136

Conversation

edgchen1 commented Dec 18, 2024 • edited Loading

Description

Motivation and Context

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edgchen1 commented Dec 18, 2024 •

edited

Loading