How to match quality of the video that is generated using playground vs. huggingface weights? #131

Annusha · 2025-02-18T17:11:12Z

Hi,

I've tried to generate video using the same prompt using both playground and then using downloaded weights. I would like to note that I tried only huggingface "genmo/mochi-1-preview" model and was following the code below:

import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video

pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview")

# Enable memory savings - disabled, as I was using H100 and it was enough
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()

prompt = "A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors."

with torch.autocast("cuda", torch.bfloat16, cache_enabled=False):
      frames = pipe(prompt, num_frames=84).frames[0]

export_to_video(frames, "mochi.mp4", fps=30)

The results are quite different (top is from the playground and the other one was generated on the local machine).
I played around with guidance_scale that improved the results a bit.
What else should I change to match the results with the playground?

mochi-red-helmet.mp4

mochi-red-helmet.huggingface.weights2.mp4

970814 · 2025-02-26T08:51:37Z

Looks funny, I have the same problem

weathon · 2025-04-27T23:31:43Z

I have the same problem

Annusha · 2025-04-30T06:37:50Z

you can augment text prompt and then it works just fine. E.g. instead of

A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors

use

Produce a cinematic scene shot on 35mm film with vivid colors. Focus on a 30-year-old spaceman wearing a handmade red wool-knit motorcycle helmet, standing in a vast salt desert under a bright blue sky. Maintain a calm, contemplative atmosphere with gentle camera movement. Show a close-up on his face as he slowly turns his head, highlighting the texture of the helmet and the subtle shifts in his expression. Emphasize a sense of solitude and quiet wonder in this serene, introspective moment.

guidance_scale 6 worked good for me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to match quality of the video that is generated using playground vs. huggingface weights? #131

How to match quality of the video that is generated using playground vs. huggingface weights? #131

How to match quality of the video that is generated using playground vs. huggingface weights? #131

How to match quality of the video that is generated using playground vs. huggingface weights? #131

Comments