NPU convertation #153

ha-seungwon · 2025-01-09T04:26:22Z

Hello,

How do I have a conversation with NPU?

rknn-llm might use rkllm-runtime/Linux/librkllm_api/aarch64/librkllmrt.so to npu kernel setting.

But is there any open-source code to test npu calculation?

waydong · 2025-01-09T06:20:54Z

Hi, this is a demo, I'm not sure if it's what you need：
https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo

ha-seungwon · 2025-01-09T06:46:44Z

Thanks for the reply :)

But my question is slightly different.

https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo demo using the "rkllm.h" to initialize the new setting or memory setting for calculation and also include the calculation part.

So, my question is where can I find the actual running code in rkllm-runtime/Linux/librkllm_api/include/rkllm.h in this header file?

ha-seungwon · 2025-01-11T12:37:12Z

How can I find or set the npu core??

Test code

RKLLM doc

ERROR

waydong · 2025-01-13T01:09:42Z

How can I find or set the npu core??

Test code
RKLLM doc ERROR

num_npu_core is set during the model conversion process; however, the documentation reflecting this configuration has not been updated yet.

ha-seungwon · 2025-01-13T04:53:56Z

How does the NPU kernel perform matrix multiplication across multiple cores, and what optimizations are necessary to ensure that matrix multiplication efficiently utilizes all available cores on a multicore NPU?

c0zaut · 2025-01-14T02:05:35Z

@ha-seungwon - all of my models are converted with multicore support, which I set during the conversion process as @waydong points out. To monitor the npu core usage, run the following command as root:

watch -n 1 'cat /sys/kernel/debug/rknpu/load'

Running that command shows NPU cores running at top performance.

Models are here in a variety of quants of optimizations: https://huggingface.co/c01zaut

ha-seungwon · 2025-01-14T05:26:59Z

@c0zaut Yes, I know how to monitor the NPU core usage.

My question is whether there is another way to monitor or detect NPU utilization during real-time inference after converting to an RKLLM model file, such as through memory mapping or matrix calculations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU convertation #153

NPU convertation #153

ha-seungwon commented Jan 9, 2025

waydong commented Jan 9, 2025

ha-seungwon commented Jan 9, 2025 •

edited

Loading

ha-seungwon commented Jan 11, 2025

waydong commented Jan 13, 2025 •

edited

Loading

ha-seungwon commented Jan 13, 2025

c0zaut commented Jan 14, 2025

ha-seungwon commented Jan 14, 2025

NPU convertation #153

NPU convertation #153

Comments

ha-seungwon commented Jan 9, 2025

waydong commented Jan 9, 2025

ha-seungwon commented Jan 9, 2025 • edited Loading

ha-seungwon commented Jan 11, 2025

waydong commented Jan 13, 2025 • edited Loading

ha-seungwon commented Jan 13, 2025

c0zaut commented Jan 14, 2025

ha-seungwon commented Jan 14, 2025

ha-seungwon commented Jan 9, 2025 •

edited

Loading

waydong commented Jan 13, 2025 •

edited

Loading