Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPU convertation #153

Open
ha-seungwon opened this issue Jan 9, 2025 · 7 comments
Open

NPU convertation #153

ha-seungwon opened this issue Jan 9, 2025 · 7 comments

Comments

@ha-seungwon
Copy link

Hello,

How do I have a conversation with NPU?

rknn-llm might use rkllm-runtime/Linux/librkllm_api/aarch64/librkllmrt.so to npu kernel setting.

But is there any open-source code to test npu calculation?

@waydong
Copy link
Collaborator

waydong commented Jan 9, 2025

Hi, this is a demo, I'm not sure if it's what you need:
https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo

@ha-seungwon
Copy link
Author

ha-seungwon commented Jan 9, 2025

Thanks for the reply :)

But my question is slightly different.

https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo demo using the "rkllm.h" to initialize the new setting or memory setting for calculation and also include the calculation part.

So, my question is where can I find the actual running code in rkllm-runtime/Linux/librkllm_api/include/rkllm.h in this header file?

@ha-seungwon
Copy link
Author

How can I find or set the npu core??

Test code

image

RKLLM doc

image

ERROR

image

@waydong
Copy link
Collaborator

waydong commented Jan 13, 2025

How can I find or set the npu core??

Test code

image RKLLM doc image ERROR image

num_npu_core is set during the model conversion process; however, the documentation reflecting this configuration has not been updated yet.

@ha-seungwon
Copy link
Author

How does the NPU kernel perform matrix multiplication across multiple cores, and what optimizations are necessary to ensure that matrix multiplication efficiently utilizes all available cores on a multicore NPU?

@c0zaut
Copy link

c0zaut commented Jan 14, 2025

@ha-seungwon - all of my models are converted with multicore support, which I set during the conversion process as @waydong points out. To monitor the npu core usage, run the following command as root:

watch -n 1 'cat /sys/kernel/debug/rknpu/load'

Running that command shows NPU cores running at top performance.

Models are here in a variety of quants of optimizations: https://huggingface.co/c01zaut

@ha-seungwon
Copy link
Author

@c0zaut Yes, I know how to monitor the NPU core usage.

My question is whether there is another way to monitor or detect NPU utilization during real-time inference after converting to an RKLLM model file, such as through memory mapping or matrix calculations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants