8000 PaliGemma 2 mix segment multiple objects · Issue #292 · Blaizzy/mlx-vlm · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PaliGemma 2 mix segment multiple objects #292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JoeJoe1313 opened this issue Apr 9, 2025 · 5 comments
Open

PaliGemma 2 mix segment multiple objects #292

JoeJoe1313 opened this issue Apr 9, 2025 · 5 comments

Comments

@JoeJoe1313
Copy link
Contributor

I am having trouble segmenting multiple objects when using PaliGemma 2 mix ("mlx-community/paligemma2-3b-mix-448-bf16", "mlx-community/paligemma2-10b-mix-448-8bit"). I also tried to directly use transformers and with the 3B model I sometimes get more than one segmented object, and sometimes I only get one. But with mlx-vlm I can only get one object segmented no matter what I try. Is there a working example? Or is there some known issue I have missed? Thank you!

@JoeJoe1313
Copy link
Contributor Author
JoeJoe1313 commented Apr 10, 2025

I found one working example for mlx-vlm as well, I believe the model is just quite unstable in terms of this task. In this successful case it returns two segmentations but both containing the same label, the one of the second object. This is the image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg and the prompt is "segment left wheel ; right wheel\n". Also, when I attempt to segment the wheels of another image containing a car (in the same position) it fails.

@Blaizzy
Copy link
Owner
Blaizzy commented Apr 10, 2025

I am having trouble segmenting multiple objects when using PaliGemma 2 mix ("mlx-community/paligemma2-3b-mix-448-bf16", "mlx-community/paligemma2-10b-mix-448-8bit"). I also tried to directly use transformers and with the 3B model I sometimes get more than one segmented object, and sometimes I only get one. But with mlx-vlm I can only get one object segmented no matter what I try. Is there a working example? Or is there some known issue I have missed? Thank you!

Hey @JoeJoe1313

Thanks for bringing this up!

Could you share a reproducible example?

@Blaizzy
Copy link
Owner
Blaizzy commented Apr 10, 2025

If you could share the transformers examples as well would be nice

Preferably with the images

@JoeJoe1313
Copy link
Contributor Author
JoeJoe1313 commented Apr 11, 2025

Here is the mlx example which is working, including plotting the masks on top of the images: https://github.com/JoeJoe1313/LLMs-Journey/blob/main/VLMs/paligemma_segmentation_mlx.py. The prompt is "segment left wheel ; right wheel" I don't have the transformers example - I directly copied from the documentation there, but I found the car image worked both for transformers and mlx-vlm so I assumed it's just a model issue. This image https://big-vision-paligemma-hf.hf.space/file=/tmp/gradio/d834f0b8126a6b8422136f3b7b1403d98a2da507/cats.png prompted with segment cat in front ; cat in back returns only one object.

@JoeJoe1313
Copy link
Contributor Author
JoeJoe1313 commented Apr 11, 2025

From what I understand these models are very sensitive to the prompt formatting, and the 448-3B-bf16 and 448-10B-8bit seem to be just not powerful enough for the task of segmenting multiple objects. Please correct me if you have other observations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0