-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLM: Model Tracing Guide #1030
base: main
Are you sure you want to change the base?
VLM: Model Tracing Guide #1030
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…anup-custom-dataset
Signed-off-by: Kyle Sayers <[email protected]>
## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>
return model_cls | ||
|
||
|
||
def parse_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have click in the setup.py, might be worth using for cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see a good reason to https://click.palletsprojects.com/en/stable/why/#why-not-argparse, but thanks for the suggestion
legacy_processing = ( | ||
(input_ids == self.config.image_token_index).sum(1).max() < self.config.image_seq_length | ||
) or (input_ids.shape[-1] == 1 and pixel_values is not None).item() | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the whole thing.
I like how much time and thought you put into making this doc.
Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing -- encoder-decoder models using GPTQ, SparseGPT Modifiers. If we move those to the intro, it will be clearer for the audience to know if the doc is applicable to them or not.
Then a small paragraph introducing what 1, 2, and 3 will be helpful for --
1 shows the description of why it cannot use the previous methods and why the seq pipeline solves the problem, 2. is how to run using cli, and 3. is debugging/contribution.
This way I think the audience can have an easier time navigating to the appropriate section by reading less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing
As for when to use tracing, that's described in the second sentence
Through reading this guide, you will learn
1. Why tracing is required when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier
As for what the problem is, that's described in the first section
## 1. Why is Tracing Required? ##
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing -- encoder-decoder models using GPTQ, SparseGPT Modifiers
That's incorrect, tracing is used for all model architectures, not just encoder-decoder models. As described in the second paragraph, tracing is used when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then a small paragraph introducing what 1, 2, and 3 will be helpful for
This way I think the audience can have an easier time navigating to the appropriate section by reading less.
I think the section titles + the list of things you will learn from reading each of the sections is enough context for a reader to go on. For example, if the reader doesn't care about the why, they can skip 1. If the reader doesn't care about what tracability is, they can skip 2. If the reader doesn't care about how to make a model traceable, they can skip 3.
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, we should consider adding a readthedoc build like vLLM to render these out
Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job.
A couple of nits:
- I wouldnt refer to the SparseGPTModifier until we've actually started using data pipelines outside of the GPTQModifier
- A helpful comment on what to focus on when looking at the images would be nice
# Sequential Pipeline # | ||
The sequential pipeline is a data pipeline, primarily used for compressing models with the | ||
[GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py) or the | ||
[SparseGPTModifier](/src/llmcompressor/modifiers/obcq/base.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're not yet using the data pipeline in the SparseGPTModifier, I would not include it in the ReadMe just yet
independently at calibration time. For a visual example of a model call graph, see | ||
[Llama_3.2-Vision.svg](/src/llmcompressor/transformers/tracing/assets/Llama_3.2-Vision.svg). | ||
|
||
<p align="center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What am I supposed to be taking away from this image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image depicts the model graph referenced in the above paragraph. The image is a concrete example of what a model graph looks like and helps illustrated what the nodes and edges are within the graph.
traced (we don't see the individual `MllamaVisionEncoder` layers, ect.). However, we can | ||
no longer target the modules within the `MllamaVisionModel` such as the | ||
`MllamaVisionEncoder` as sequential targets. If any modules within the | ||
`MllamaVisionModel` are being compressed, their hessians be all be allocated at the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar their hessians be all be allocated ...
multimodal_data: bool, | ||
sequential_targets: Optional[Union[List[str], str]] = None, | ||
ignore: Union[List[str], str] = [], | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
Purpose
This guide explains the concepts of tracing as they relate to LLM Compressor and how to modify your model to support recipes which require using the Sequential Pipeline.
Through reading this guide, you will learn
Prerequisites
text
kwarg #1031Changes
src/llmcompressor/transformers/tracing/README.md
with picturessrc/llmcompressor/pipelines/sequential/README.md
src/llmcompressor/transformers/tracing/debug.py
llm-compressor.attempt_trace
entrypoint for ease of usellava_example.py
and andpixtral_example.py
to match the order of arguments on the modifierTesting
Use the
llmcompressor.attempt_trace
debug scriptStretch
It might be nice if this tracing debugger tool also printed the model graph to an svg