Description
Hi! I have been having some trouble to get the repo and models working. Specially, I tried to run the evaluation scripts (specifically COCO captioning) as reported in the README using the checkpoint that is available at the huggingface hub (https://huggingface.co/csuhan/OneLLM-7B). I'm using a A500 24GB GPU for inference.
The CIDEr result I get is 0.02, much lower than expected taking into account that the model is trained on MS COCO data. The captions are not accurate and lack variability (I pasted some examples below). Moreover, it consistently refers to the images being black and white. I doubled check that they are downloaded properly and I used the code as-is after only adapting the paths. Is the checkpoint use-ready and adequate for finetuning on additional tasks? Is there any step missing from the repo docs that I should be doing?
Please feel free to request additional information about my setup that might be relevant to the problem.
Thanks!
{
"image_id": 184613,
"caption": "A close up of a black and white photo of a cat."
},
{
"image_id": 403013,
"caption": "A black and white photo of a long object."
},
{
"image_id": 562150,
"caption": "A black and white photo of a long object."
},
{
"image_id": 360772,
"caption": "A black and white photo of a long thin object."
},
{
"image_id": 340559,
"caption": "A black and white photo of a long object."
},
{
"image_id": 321107,
"caption": "A black and white photo of a black object."
},
{
"image_id": 129001,
"caption": "A black and white photo of a long object."
},
{
"image_id": 556616,
"caption": "A black and white photo of a long object."
},
{
"image_id": 472621,
"caption": "A black and white photo of a blurry object."
},
{
"image_id": 364521,
"caption": "A black and white photo of a black and white object."
},
{
"image_id": 310391,
"caption": "A black and white photo of a blank screen."
},