[WIP] Fix hessian memory requirements #1084

kylesayrs · 2025-01-20T18:25:11Z

No description provided.

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-01-20T18:25:24Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Support compressing Qwen2VLForConditionalGeneration with vision calibration data ## Follow-ups ## * `Qwen/Qwen2-VL-72B-Instruct` has memory issues that are unrelated to the VLM architecture and which result from incorrect assumptions in `calculate_offload_device_map`. See #1084 * When this lands, we'll replace the `2B` example with the `72B` example, since the accuracy loss from quantizing a 2B is pretty severe ## Changes ## * Add tracable model definition`src/llmcompressor/transformers/tracing/qwen2_vl.py` * This mostly involves wrapping functions related to rope with image embeddings * The `_prepare_4d_causal_attention_mask_with_cache_position` function has conditional logic `if attention_mask is not None`. This might be fixable with metadata in the future * Add example script `examples/multimodal_vision/qwen2_vl_example.py` * Qwen2_vl requires some custom data preprocessing and tokenization, which is implemented in the example script ## Testing ## * Ran `examples/multimodal_vision/qwen2_vl_example.py` to completion with both 2B ``` ========== SAMPLE GENERATION ============== system You are a helpful assistant. user Please describe the animal in this image assistant The animal in the image is a white kitten. It has a fluffy coat and is resting on a white keyboard. The kitten appears to be comfortable and relaxed, possibly enjoying the warmth of the keyboard. ========================================== ``` </details> ## Evaluation ## Base ``` hf-multimodal (pretrained=Qwen/Qwen2-VL-2B-Instruct,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------------|------:|------|-----:|------|---|----:|---|-----:| |Computer Science| 0|none | 0|acc |↑ | 0.2|± |0.0743| ``` Quantized ``` hf-multimodal (pretrained=/home/kyle/llm-compressor/Qwen2-VL-2B-Instruct-W4A16-G128,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------------|------:|------|-----:|------|---|----:|---|-----:| |Computer Science| 0|none | 0|acc |↑ | 0.1|± |0.0557| ``` > we'll replace the 2B example with the 72B example, since the accuracy loss from quantizing a 2B is pretty severe --------- Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 7 commits January 14, 2025 17:38

Add TraceableQwen2VLForConditionalGeneration

ea8f047

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/qwen-tracable

4274faf

use custom preprocessing and tokenization

282c1c4

Signed-off-by: Kyle Sayers <[email protected]>

use auto device map

f91bd6d

Signed-off-by: Kyle Sayers <[email protected]>

simplify tracing changes

85ef39e

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into kylesayrs/qwen-tracable

137c00a

WIP

66bcde9

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 20, 2025

VLM: Qwen2_VL Example #1027

Merged

kylesayrs changed the base branch from main to kylesayrs/qwen-tracable January 20, 2025 18:34

kylesayrs added 2 commits January 20, 2025 14:40

allocate more for now

04a2c76

Signed-off-by: Kyle Sayers <[email protected]>

asdf

21f941e

Signed-off-by: Kyle Sayers <[email protected]>

Base automatically changed from kylesayrs/qwen-tracable to main January 20, 2025 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix hessian memory requirements #1084

[WIP] Fix hessian memory requirements #1084

kylesayrs commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

[WIP] Fix hessian memory requirements #1084

Are you sure you want to change the base?

[WIP] Fix hessian memory requirements #1084

Conversation

kylesayrs commented Jan 20, 2025

github-actions bot commented Jan 20, 2025