PaliGemma 2 mix segment multiple objects #292

JoeJoe1313 · 2025-04-09T13:16:14Z

I am having trouble segmenting multiple objects when using PaliGemma 2 mix ("mlx-community/paligemma2-3b-mix-448-bf16", "mlx-community/paligemma2-10b-mix-448-8bit"). I also tried to directly use transformers and with the 3B model I sometimes get more than one segmented object, and sometimes I only get one. But with mlx-vlm I can only get one object segmented no matter what I try. Is there a working example? Or is there some known issue I have missed? Thank you!

JoeJoe1313 · 2025-04-10T10:51:41Z

I found one working example for mlx-vlm as well, I believe the model is just quite unstable in terms of this task. In this successful case it returns two segmentations but both containing the same label, the one of the second object. This is the image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg and the prompt is "segment left wheel ; right wheel\n". Also, when I attempt to segment the wheels of another image containing a car (in the same position) it fails.

Blaizzy · 2025-04-10T11:50:57Z

I am having trouble segmenting multiple objects when using PaliGemma 2 mix ("mlx-community/paligemma2-3b-mix-448-bf16", "mlx-community/paligemma2-10b-mix-448-8bit"). I also tried to directly use transformers and with the 3B model I sometimes get more than one segmented object, and sometimes I only get one. But with mlx-vlm I can only get one object segmented no matter what I try. Is there a working example? Or is there some known issue I have missed? Thank you!

Hey @JoeJoe1313

Thanks for bringing this up!

Could you share a reproducible example?

Blaizzy · 2025-04-10T11:51:52Z

If you could share the transformers examples as well would be nice

Preferably with the images

JoeJoe1313 · 2025-04-11T16:49:30Z

Here is the mlx example which is working, including plotting the masks on top of the images: https://github.com/JoeJoe1313/LLMs-Journey/blob/main/VLMs/paligemma_segmentation_mlx.py. The prompt is "segment left wheel ; right wheel" I don't have the transformers example - I directly copied from the documentation there, but I found the car image worked both for transformers and mlx-vlm so I assumed it's just a model issue. This image https://big-vision-paligemma-hf.hf.space/file=/tmp/gradio/d834f0b8126a6b8422136f3b7b1403d98a2da507/cats.png prompted with segment cat in front ; cat in back returns only one object.

JoeJoe1313 · 2025-04-11T16:58:45Z

From what I understand these models are very sensitive to the prompt formatting, and the 448-3B-bf16 and 448-10B-8bit seem to be just not powerful enough for the task of segmenting multiple objects. Please correct me if you have other observations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PaliGemma 2 mix segment multiple objects #292

PaliGemma 2 mix segment multiple objects #292

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PaliGemma 2 mix segment multiple objects #292

PaliGemma 2 mix segment multiple objects #292

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!