Make HuggingfaceMultimodalModel variants more explicit #186

phisad · 2025-05-09T13:38:33Z

Currently, the multimodal backend for Huggingface models initializes only a single generalized model, which is configured externally via the model_registry.

"model_config": {
    "model_class": "transformers.IdeficsForVisionText2Text",
    "processor_class": "transformers.AutoProcessor",
    "processor_config": {},
    "prompt": "clemcore.backends.multimodal_utils.generate_idefics_prompt_text", <-- should go into own class
    "response": "clemcore.backends.multimodal_utils.generate_idefics_response", <-- should go into own class
    "eos_to_cull": "<end_of_utterance>",
    "output_split_prefix": "Assistant:",
    "multimodality": {
        "single_image": true,
        "multiple_images": true,
        "audio": false,
        "video": false
    },
    "mm_model_config": {
        "torch_dtype": "auto",
        "device_map": "auto"
    }
}

The problem is that configuring class behavior (prompt and response methods in multimodal_utils) from outside placed like this is rather opaque, hence error-prone and hard to debug.
Which arguments are required to be configured is indicated very late when the dictionary access fails in generate_idefics_response.

A better approach would be to use the backend class as intended, that is, as a router based on the model type to create specific types, e.g.:

    def get_model_for(self, model_spec: backends.ModelSpec) -> backends.Model:
        """Get the model for the specified model specification.

        Args:
            model_spec (backends.ModelSpec): The model specification.

        Returns:
            backends.Model: The model instance.
        """
        if model_spec.model_name in ("InterVL3"):
            return HuggingfaceInterVLModel(model_spec)
        if model_spec.model_name in ("Idefics"):
            return HuggingfaceIdefics(model_spec)
        return HuggingfaceMultimodalModel(model_spec)

And then implement the prompt and generate behavior there each particular model.

The text was updated successfully, but these errors were encountered:

phisad added the dev: chore Minor tasks, cleanup, or internal maintenance (not user-facing) label May 9, 2025

phisad mentioned this issue May 9, 2025

integrated InternVL3 #184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make HuggingfaceMultimodalModel variants more explicit #186

Make HuggingfaceMultimodalModel variants more explicit #186

Make HuggingfaceMultimodalModel variants more explicit #186

Make HuggingfaceMultimodalModel variants more explicit #186

Comments

Uh oh!