You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the multimodal backend for Huggingface models initializes only a single generalized model, which is configured externally via the model_registry.
"model_config": {
"model_class": "transformers.IdeficsForVisionText2Text",
"processor_class": "transformers.AutoProcessor",
"processor_config": {},
"prompt": "clemcore.backends.multimodal_utils.generate_idefics_prompt_text", <-- should go into own class
"response": "clemcore.backends.multimodal_utils.generate_idefics_response", <-- should go into own class
"eos_to_cull": "<end_of_utterance>",
"output_split_prefix": "Assistant:",
"multimodality": {
"single_image": true,
"multiple_images": true,
"audio": false,
"video": false
},
"mm_model_config": {
"torch_dtype": "auto",
"device_map": "auto"
}
}
The problem is that configuring class behavior (prompt and response methods in multimodal_utils) from outside placed like this is rather opaque, hence error-prone and hard to debug.
Which arguments are required to be configured is indicated very late when the dictionary access fails in generate_idefics_response.
A better approach would be to use the backend class as intended, that is, as a router based on the model type to create specific types, e.g.:
def get_model_for(self, model_spec: backends.ModelSpec) -> backends.Model:
"""Get the model for the specified model specification.
Args:
model_spec (backends.ModelSpec): The model specification.
Returns:
backends.Model: The model instance.
"""
if model_spec.model_name in ("InterVL3"):
return HuggingfaceInterVLModel(model_spec)
if model_spec.model_name in ("Idefics"):
return HuggingfaceIdefics(model_spec)
return HuggingfaceMultimodalModel(model_spec)
And then implement the prompt and generate behavior there each particular model.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Currently, the multimodal backend for Huggingface models initializes only a single generalized model, which is configured externally via the model_registry.
The problem is that configuring class behavior (prompt and response methods in multimodal_utils) from outside placed like this is rather opaque, hence error-prone and hard to debug.
Which arguments are required to be configured is indicated very late when the dictionary access fails in
generate_idefics_response
.A better approach would be to use the backend class as intended, that is, as a router based on the model type to create specific types, e.g.:
And then implement the prompt and generate behavior there each particular model.
The text was updated successfully, but these errors were encountered: