Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-4 Conversion Failure #148

Open
c0zaut opened this issue Dec 25, 2024 · 6 comments
Open

Phi-4 Conversion Failure #148

c0zaut opened this issue Dec 25, 2024 · 6 comments

Comments

@c0zaut
Copy link

c0zaut commented Dec 25, 2024

@waydong

Configuration: https://huggingface.co/c01zaut/phi-4/

Model weights: https://huggingface.co/NyxKrage/Microsoft_Phi-4/

Code:

>>> from rkllm.api import RKLLM
>>> rk = RKLLM()
INFO: rkllm-toolkit version: 1.1.4
>>> rk.load_huggingface('/root/toolkit/models/phi-4')
WARNING: `flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
WARNING: Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [01:06<00:00, 11.02s/it]
0
>>> rk.build(do_quantization=True, optimization_level=0, quantized_dtype="w8a8")
Building model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 527/527 [02:56<00:00,  2.99it/s]
0
>>> rk.export_rkllm("/root/toolkit/models/phi-4.rkllm")
ERROR: Catch exception when converting model: Argument 'value' has incorrect type (expected int, got NoneType)
-1
>>> print(rk.base.state)
run_state.build_model

Could you take a look and let me know what setting/parameter needs to be set/how to get proper debug output, so I can adjust the setting in the Phi3 config?

@c0zaut
Copy link
Author

c0zaut commented Dec 25, 2024

I won't post the code here, but I also tested the tokenizer via the RKLLM API, and it was producing correct output for encoding/decoding tokens.

Also, is it possible to enable flash attention for optimizing? I know it is possible with rknn, but don't see an option in the LLM converter API.

Thank you!

@imkebe
Copy link

imkebe commented Jan 9, 2025

Official PHI-4 on HF has been released.... any updates?

@waydong
Copy link
Collaborator

waydong commented Jan 10, 2025

Hi, there will be updates in the near future.

@c0zaut
Copy link
Author

c0zaut commented Jan 11, 2025

Thanks @waydong ! Will the updates be in the same 1.1.x version of the library, i.e. 1.1.5, so it is backwards compatible? If not, will there be a similar update_rkllm() function so I can just pull and update my models in Huggingface instead of re-doing the conversion and UI?

@waydong
Copy link
Collaborator

waydong commented Jan 13, 2025

Thanks @waydong ! Will the updates be in the same 1.1.x version of the library, i.e. 1.1.5, so it is backwards compatible? If not, will there be a similar update_rkllm() function so I can just pull and update my models in Huggingface instead of re-doing the conversion and UI?

Yes, we will maintain an interface(update_rkllm) to support easy model upgrades.

@c0zaut
Copy link
Author

c0zaut commented Jan 14, 2025

@waydong Thank you! Will the updated library also require a new kernel module? If so, could you make dynamic instead of built-in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants