Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix hessian memory requirements #1084

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

kylesayrs
Copy link
Collaborator

No description provided.

@kylesayrs kylesayrs mentioned this pull request Jan 20, 2025
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/qwen-tracable January 20, 2025 18:34
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
mgoin pushed a commit that referenced this pull request Jan 20, 2025
## Purpose ##
* Support compressing Qwen2VLForConditionalGeneration with vision
calibration data

## Follow-ups ##
* `Qwen/Qwen2-VL-72B-Instruct` has memory issues that are unrelated to
the VLM architecture and which result from incorrect assumptions in
`calculate_offload_device_map`. See
#1084
* When this lands, we'll replace the `2B` example with the `72B`
example, since the accuracy loss from quantizing a 2B is pretty severe

## Changes ##
* Add tracable model
definition`src/llmcompressor/transformers/tracing/qwen2_vl.py`
* This mostly involves wrapping functions related to rope with image
embeddings
* The `_prepare_4d_causal_attention_mask_with_cache_position` function
has conditional logic `if attention_mask is not None`. This might be
fixable with metadata in the future
* Add example script `examples/multimodal_vision/qwen2_vl_example.py`
* Qwen2_vl requires some custom data preprocessing and tokenization,
which is implemented in the example script

## Testing ##
* Ran `examples/multimodal_vision/qwen2_vl_example.py` to completion
with both 2B

```
========== SAMPLE GENERATION ==============
system                                                                                                                  
You are a helpful assistant.
user                                                                                                                    
Please describe the animal in this image
                                                                                                                        
assistant     
The animal in the image is a white kitten. It has a fluffy coat and is resting on a white keyboard. The kitten appears to be comfortable and relaxed, possibly enjoying the warmth of the keyboard.
==========================================
```

</details>

## Evaluation ##
Base
```
hf-multimodal (pretrained=Qwen/Qwen2-VL-2B-Instruct,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|     Tasks      |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------------|------:|------|-----:|------|---|----:|---|-----:|
|Computer Science|      0|none  |     0|acc   |↑  |  0.2|±  |0.0743|
```

Quantized
```
hf-multimodal (pretrained=/home/kyle/llm-compressor/Qwen2-VL-2B-Instruct-W4A16-G128,dtype=bfloat16,add_bos_token=True,convert_img_format=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|     Tasks      |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------------|------:|------|-----:|------|---|----:|---|-----:|
|Computer Science|      0|none  |     0|acc   |↑  |  0.1|±  |0.0557|
```

> we'll replace the 2B example with the 72B example, since the accuracy
loss from quantizing a 2B is pretty severe

---------

Signed-off-by: Kyle Sayers <[email protected]>
Base automatically changed from kylesayrs/qwen-tracable to main January 20, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant