You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Quantized models" is a very overloaded terminology :) tract has some support for the QMatMul and QConv operators in ONNX, but ONNX lagged behind the SOTA for a lot of time with regards to model quantization and compression (which can maybe be explained by a focus on the training side affairs). For instance, last time I checked, there was no Q8-like type (with scale and offset) support in regular arithmetic operations in tract (Add, Mul, ...) making it a difficult format to manage quantized models. And with the LLM boom, it feels like the community is moving to more bespoke formats like GGML...
Just curious if anyone tried it with tract.
The text was updated successfully, but these errors were encountered: