-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPU convertation #153
Comments
Hi, this is a demo, I'm not sure if it's what you need: |
Thanks for the reply :) But my question is slightly different. https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo demo using the "rkllm.h" to initialize the new setting or memory setting for calculation and also include the calculation part. So, my question is where can I find the actual running code in rkllm-runtime/Linux/librkllm_api/include/rkllm.h in this header file? |
How does the NPU kernel perform matrix multiplication across multiple cores, and what optimizations are necessary to ensure that matrix multiplication efficiently utilizes all available cores on a multicore NPU? |
@ha-seungwon - all of my models are converted with multicore support, which I set during the conversion process as @waydong points out. To monitor the npu core usage, run the following command as root:
Running that command shows NPU cores running at top performance. Models are here in a variety of quants of optimizations: https://huggingface.co/c01zaut |
@c0zaut Yes, I know how to monitor the NPU core usage. My question is whether there is another way to monitor or detect NPU utilization during real-time inference after converting to an RKLLM model file, such as through memory mapping or matrix calculations |
Hello,
How do I have a conversation with NPU?
rknn-llm might use rkllm-runtime/Linux/librkllm_api/aarch64/librkllmrt.so to npu kernel setting.
But is there any open-source code to test npu calculation?
The text was updated successfully, but these errors were encountered: