VLM: Model Tracing Guide #1030

kylesayrs · 2025-01-02T23:49:44Z

Purpose

This guide explains the concepts of tracing as they relate to LLM Compressor and how to modify your model to support recipes which require using the Sequential Pipeline.

Through reading this guide, you will learn

Why tracing is required when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier
How to determine if your model is traceable for your dataset
How to modify your model definition to be traceable

Prerequisites

Explicit dataset tokenizer text kwarg #1031

Changes

Add a model tracing guide src/llmcompressor/transformers/tracing/README.md with pictures
Add a readme for the sequential pipeline which points to the Tracing Guide src/llmcompressor/pipelines/sequential/README.md
Add a debug script to help users debug their models for traceability src/llmcompressor/transformers/tracing/debug.py
- Add the llm-compressor.attempt_trace entrypoint for ease of use
Swap the order of arguments in llava_example.py and and pixtral_example.py to match the order of arguments on the modifier

Testing

Use the llmcompressor.attempt_trace debug script

llmcompressor.attempt_trace \
    --model_id llava-hf/llava-1.5-7b-hf
    --model_class TraceableLlavaForConditionalGeneration
    --sequential-targets LlamaDecoderLayer
    --ignore "re:.*lm_head" "re:vision_tower.*" "re:multi_modal_projector.*"
    --multimodal_data

Stretch

It might be nice if this tracing debugger tool also printed the model graph to an svg

Signed-off-by: Kyle Sayers <[email protected]>

…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

…ataset

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

…artition

Signed-off-by: Kyle Sayers <[email protected]>

…anup-custom-dataset

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>

setup.py

src/llmcompressor/transformers/tracing/debug.py

horheynm · 2025-01-13T17:53:00Z

src/llmcompressor/transformers/tracing/debug.py

+    return model_cls
+
+
+def parse_args():


We have click in the setup.py, might be worth using for cli

I don't really see a good reason to https://click.palletsprojects.com/en/stable/why/#why-not-argparse, but thanks for the suggestion

src/llmcompressor/transformers/tracing/README.md

horheynm · 2025-01-13T18:55:02Z

src/llmcompressor/transformers/tracing/README.md

+legacy_processing = (
+    (input_ids == self.config.image_token_index).sum(1).max() < self.config.image_seq_length
+) or (input_ids.shape[-1] == 1 and pixel_values is not None).item()
+```


I read the whole thing.

I like how much time and thought you put into making this doc.

Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing -- encoder-decoder models using GPTQ, SparseGPT Modifiers. If we move those to the intro, it will be clearer for the audience to know if the doc is applicable to them or not.

Then a small paragraph introducing what 1, 2, and 3 will be helpful for --
1 shows the description of why it cannot use the previous methods and why the seq pipeline solves the problem, 2. is how to run using cli, and 3. is debugging/contribution.

This way I think the audience can have an easier time navigating to the appropriate section by reading less.

Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing

As for when to use tracing, that's described in the second sentence

Through reading this guide, you will learn 1. Why tracing is required when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier

As for what the problem is, that's described in the first section

## 1. Why is Tracing Required? ##

Right now, the audience needs to read until 3rd paragraph to know what the problem is and when to use the tracing -- encoder-decoder models using GPTQ, SparseGPT Modifiers

That's incorrect, tracing is used for all model architectures, not just encoder-decoder models. As described in the second paragraph, tracing is used when compressing with recipes involving the Sequential Pipeline and modifiers such as GPTQModifier.

Then a small paragraph introducing what 1, 2, and 3 will be helpful for
This way I think the audience can have an easier time navigating to the appropriate section by reading less.

I think the section titles + the list of things you will learn from reading each of the sections is enough context for a reader to go on. For example, if the reader doesn't care about the why, they can skip 1. If the reader doesn't care about what tracability is, they can skip 2. If the reader doesn't care about how to make a model traceable, they can skip 3.

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Allow VLM processors to be used to tokenize datasets with prompt keys ## Postrequisites ## * #1030 ## Changes ## * Use `text` argument name for tokenizing the prompt column ## Testing ## * w.r.t. tokenizers, using the `text` kwarg follows the precedent set by [PretrainedTokenizerBase](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L2790) * w.r.t. processors, most processors use the text kwarg Below are all the models I know to be compatible with this change, I'm assuming that most other processors follow the same standard 1. [llama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/tokenization_llama.py#L233) 2. [pixtral](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/processing_pixtral.py#L160) 3. [phi3_vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L321) 4. [mllama](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/processing_mllama.py#L232) 5. [qwen2_vl](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py#L71) Example of using VLM processor to tokenize a dataset with prompt key ```python3 from transformers import AutoProcessor from llmcompressor.transformers import DataTrainingArguments, TextGenerationDataset models_to_test = [ "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai/Mixtral-8x7B-Instruct-v0.1", "Qwen/Qwen2-VL-2B-Instruct", # fails without changes "mgoin/pixtral-12b", # fails without changes ] for model_id in models_to_test: processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) data_args = DataTrainingArguments( dataset="ultrachat-200k", splits={"calibration": "test_sft[:1]"} ) dataset = TextGenerationDataset.load_from_registry( data_args.dataset, data_args=data_args, split=data_args.splits["calibration"], processor=processor, )(add_labels=False) ``` Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

mgoin

Great work, we should consider adding a readthedoc build like vLLM to render these out

src/llmcompressor/pipelines/sequential/README.md

Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>

dsikka

Great job.

A couple of nits:

I wouldnt refer to the SparseGPTModifier until we've actually started using data pipelines outside of the GPTQModifier
A helpful comment on what to focus on when looking at the images would be nice

dsikka · 2025-01-23T01:48:24Z

src/llmcompressor/pipelines/sequential/README.md

+# Sequential Pipeline #
+The sequential pipeline is a data pipeline, primarily used for compressing models with the
+[GPTQModifier](/src/llmcompressor/modifiers/quantization/gptq/base.py) or the
+[SparseGPTModifier](/src/llmcompressor/modifiers/obcq/base.py).


Because we're not yet using the data pipeline in the SparseGPTModifier, I would not include it in the ReadMe just yet

dsikka · 2025-01-23T02:05:19Z

src/llmcompressor/transformers/tracing/GUIDE.md

+independently at calibration time. For a visual example of a model call graph, see
+[Llama_3.2-Vision.svg](/src/llmcompressor/transformers/tracing/assets/Llama_3.2-Vision.svg).
+
+<p align="center">


What am I supposed to be taking away from this image?

This image depicts the model graph referenced in the above paragraph. The image is a concrete example of what a model graph looks like and helps illustrated what the nodes and edges are within the graph.

dsikka · 2025-01-23T02:11:02Z

src/llmcompressor/transformers/tracing/GUIDE.md

+traced (we don't see the individual `MllamaVisionEncoder` layers, ect.). However, we can
+no longer target the modules within the `MllamaVisionModel` such as the
+`MllamaVisionEncoder` as sequential targets. If any modules within the
+`MllamaVisionModel` are being compressed, their hessians be all be allocated at the same


grammar their hessians be all be allocated ...

dsikka · 2025-01-23T02:20:36Z

src/llmcompressor/transformers/tracing/debug.py

+    multimodal_data: bool,
+    sequential_targets: Optional[Union[List[str], str]] = None,
+    ignore: Union[List[str], str] = [],
+):


kylesayrs added 30 commits November 26, 2024 03:09

wip

ebc2c41

wip

922b407

testing with lots of models

0577f36

preliminary data pipeline

3830696

WIP

1ecaa39

delete unnecessary files

9aa9679

Merge remote-tracking branch 'origin' into kylesayrs/gptq-partition

7e6fe17

Merge branch 'kylesayrs/gptq-hooks' into kylesayrs/gptq-partition

034c0b1

clean up CustomDataset

a62617c

Signed-off-by: Kyle Sayers <[email protected]>

chchchchanges

57b5e02

Signed-off-by: Kyle Sayers <[email protected]>

wip: use rename to processor, going through tests

fa317fd

Signed-off-by: Kyle Sayers <[email protected]>

remove labels from calibration dataset rather than assuming that all …

f3f5875

…tokenized datasets should not be given labels Signed-off-by: Kyle Sayers <[email protected]>

cleanup

58c3afe

Signed-off-by: Kyle Sayers <[email protected]>

cleanup, etc

72aecfc

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/cleanup-custom-d…

77217fb

…ataset

fix typehinting

4461a3e

Signed-off-by: Kyle Sayers <[email protected]>

add typechecking imports

fb33001

Signed-off-by: Kyle Sayers <[email protected]>

remove sparseml utilities

bf4744a

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

62ae31d

…anup-custom-dataset

use in model_load

7e516c1

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into kylesayrs/calculate_offload_default_gpus

d69106e

remove use of RECIPE FILE NAME

9e33641

Signed-off-by: Kyle Sayers <[email protected]>

rename to RECIPE_FILE_NAME, avoid circular import

58c0fba

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

b28aaae

…anup-custom-dataset

image dataset collation

8d13013

Merge branch 'kylesayrs/cleanup-custom-dataset' into kylesayrs/gptq-p…

17cf9f3

…artition

cleanup, do not handle case where processor is None

163ee8f

Signed-off-by: Kyle Sayers <[email protected]>

remove qa ignore

1180b34

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/remove-sparseml-utilities' into kylesayrs/cle…

ad20ae7

…anup-custom-dataset

add documentation

c431958

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs requested review from mgoin, dsikka and horheynm January 10, 2025 06:09

Merge branch 'main' into kylesayrs/traceability-readme

0e3e8bd

horheynm reviewed Jan 13, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

horheynm reviewed Jan 13, 2025

View reviewed changes

src/llmcompressor/transformers/tracing/debug.py Outdated Show resolved Hide resolved

horheynm reviewed Jan 13, 2025

View reviewed changes

src/llmcompressor/transformers/tracing/README.md Outdated Show resolved Hide resolved

horheynm reviewed Jan 13, 2025

View reviewed changes

kylesayrs requested a review from horheynm January 13, 2025 19:25

kylesayrs added 5 commits January 13, 2025 19:28

partial derivatives are not alphanumeric

feeb67e

Signed-off-by: Kyle Sayers <[email protected]>

rename attempt_trace to trace

d6441f5

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/traceability-readme

3bd3ca7

Merge branch 'main' into kylesayrs/traceability-readme

bb7ca2e

rename to guide, link to guide in warning

08fad5d

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 3 commits January 19, 2025 06:56

typos

5f23f52

Signed-off-by: Kyle Sayers <[email protected]>

typo

6c71263

Signed-off-by: Kyle Sayers <[email protected]>

add summary

08f9f79

Signed-off-by: Kyle Sayers <[email protected]>

mgoin previously approved these changes Jan 20, 2025

View reviewed changes

src/llmcompressor/pipelines/sequential/README.md Outdated Show resolved Hide resolved

Update src/llmcompressor/pipelines/sequential/README.md

9e6ceb8

Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Michael Goin <[email protected]>

kylesayrs dismissed mgoin’s stale review via 9e6ceb8 January 20, 2025 21:01

kylesayrs added the ready When a PR is ready for review label Jan 20, 2025

kylesayrs mentioned this pull request Jan 20, 2025

[WIP] [Audio] Audio dataset and tracer tool #1086

Draft

Merge branch 'main' into kylesayrs/traceability-readme

7536b7d

kylesayrs mentioned this pull request Jan 20, 2025

[VLM] Examples README #1057

Draft

dsikka reviewed Jan 23, 2025

View reviewed changes

dsikka added 2 commits January 22, 2025 21:24

Merge branch 'main' into kylesayrs/traceability-readme

32dd0e3

Merge branch 'main' into kylesayrs/traceability-readme

68586cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM: Model Tracing Guide #1030

VLM: Model Tracing Guide #1030

kylesayrs commented Jan 2, 2025 •

edited

Loading

horheynm Jan 13, 2025

kylesayrs Jan 13, 2025 •

edited

Loading

horheynm Jan 13, 2025

kylesayrs Jan 13, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading

mgoin left a comment

dsikka left a comment

dsikka Jan 23, 2025

dsikka Jan 23, 2025

kylesayrs Jan 23, 2025

dsikka Jan 23, 2025

dsikka Jan 23, 2025

VLM: Model Tracing Guide #1030

Are you sure you want to change the base?

VLM: Model Tracing Guide #1030

Conversation

kylesayrs commented Jan 2, 2025 • edited Loading

Purpose

Prerequisites

Changes

Testing

Stretch

horheynm Jan 13, 2025

Choose a reason for hiding this comment

kylesayrs Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

horheynm Jan 13, 2025

Choose a reason for hiding this comment

kylesayrs Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

dsikka Jan 23, 2025

Choose a reason for hiding this comment

dsikka Jan 23, 2025

Choose a reason for hiding this comment

kylesayrs Jan 23, 2025

Choose a reason for hiding this comment

dsikka Jan 23, 2025

Choose a reason for hiding this comment

dsikka Jan 23, 2025

Choose a reason for hiding this comment

kylesayrs commented Jan 2, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading

kylesayrs Jan 13, 2025 •

edited

Loading