8000 Make HuggingfaceMultimodalModel variants more explicit · Issue #186 · clp-research/clemcore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Make HuggingfaceMultimodalModel variants more explicit #186

New issue

Have a question about t 8000 his project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phisad opened this issue May 9, 2025 · 0 comments
Open

Make HuggingfaceMultimodalModel variants more explicit #186

phisad opened this issue May 9, 2025 · 0 comments
Labels
dev: chore Minor tasks, cleanup, or internal maintenance (not user-facing)

Comments

@phisad
Copy link
Collaborator
phisad commented May 9, 2025

Currently, the multimodal backend for Huggingface models initializes only a single generalized model, which is configured externally via the model_registry.

"model_config": {
    "model_class": "transformers.IdeficsForVisionText2Text",
    "processor_class": "transformers.AutoProcessor",
    "processor_config": {},
    "prompt": "clemcore.backends.multimodal_utils.generate_idefics_prompt_text", <-- should go into own class
    "response": "clemcore.backends.multimodal_utils.generate_idefics_response", <-- should go into own class
    "eos_to_cull": "<end_of_utterance>",
    "output_split_prefix": "Assistant:",
    "multimodality": {
        "single_image": true,
        "multiple_images": true,
        "audio": false,
        "video": false
    },
    "mm_model_config": {
        "torch_dtype": "auto",
        "device_map": "auto"
    }
}

The problem is that configuring class behavior (prompt and response methods in multimodal_utils) from outside placed like this is rather opaque, hence error-prone and hard to debug.
Which arguments are required to be configured is indicated very late when the dictionary access fails in generate_idefics_response.

A better approach would be to use the backend class as intended, that is, as a router based on the model type to create specific types, e.g.:

    def get_model_for(self, model_spec: backends.ModelSpec) -> backends.Model:
        """Get the model for the specified model specification.

        Args:
            model_spec (backends.ModelSpec): The model specification.

        Returns:
            backends.Model: The model instance.
        """
        if model_spec.model_name in ("InterVL3"):
            return HuggingfaceInterVLModel(model_spec)
        if model_spec.model_name in ("Idefics"):
            return HuggingfaceIdefics(model_spec)
        return HuggingfaceMultimodalModel(model_spec)

And then implement the prompt and generate behavior there each particular model.

@phisad phisad added the dev: chore Minor tasks, cleanup, or internal maintenance (not user-facing) label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev: chore Minor tasks, cleanup, or internal maintenance (not user-facing)
Projects
None yet
Development

No branches or pull requests

1 participant
0