You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# --- LP: RIFLEx BEGIN# 'Index of intrinsic frequency'k=2# 'The period of intrinsic frequency in latent spaceN_k=20# Whether model is finetuned versionfinetune=None# the number of frames for inferenceL_test= (num_frames-1) //4+1# latent frames# For training-free, if extrapolate length exceeds the period of intrinsic frequency, modify RoPEifL_test>N_kandnotfinetune:
pipe._prepare_rotary_positional_embeddings=MethodType(
partial(_prepare_rotary_positional_embeddings_riflex, k=k, L_test=L_test), pipe)
# We fine-tune the model on new theta_k and N_k, and thus modify RoPE to match the fine-tuning setting.iffinetune:
L_test=N_k# the fine-tuning frequency settingpipe._prepare_rotary_positional_embeddings=MethodType(
partial(_prepare_rotary_positional_embeddings_riflex, k=args.k, L_test=L_test), pipe)
# --- LP: RIFLEx END
Now, in my basic case I always have finetune=None (no fined-tuned model).
Above the pipe is created in this way
because I have I2V and T2V pipes.
Now in my system I did this calculations
# set frame rate''' 4 SECONDS For 12 FPS 4 * 12 = 48 FRAMES 6 SECONDS 6 * 8 = 48 FRAMES num_frames = 49 because of (num_seconds * fps + 1) For 24 FPS 6 * 24 = 144 FRAMES num_frames = 145 '''max_seconds=6num_seconds=duration# up to 6ifnum_seconds>max_seconds:
num_seconds=max_secondsmax_frames=49# (num_seconds * fps + 1) = 6 * 4 + 1 = 25# ValueError: The number of frames must be less than 49 for now due to static positional embeddings. This will be updated in the future to remove this limitation.num_frames=num_seconds*fps+1ifnum_frames>max_frames:
# exceeded max framesnum_frames=max_frameselse:
# too few frames set maxnum_frames=max_frames# adjust fps if not upscalefps=fpsifenable_rifeelsemath.ceil((num_frames-1) /num_seconds)
print(f"Generating using seconds:{num_seconds} max seconds:{max_seconds} using frames:{num_frames} max frames:{max_frames} @{fps} FPS")
this because I wanted to let the user to pass num_seconds and then adapt the value for num_frames:
parser.add_argument(
"--duration", type=int, default=6, help="Duration in seconds"
)
While in RIFLEx I see that num_frames is set by defaults to 97, why?
Also I don't get this assert where by defaults k=2:
# num_frames defaulst to 97, hence `num_frames-1=96`, why?assert (num_frames-1) %4==0, "num_frames should be 4 * k + 1"
Is this to because you want to ensure that num_frame is at least k times 4. Why?
The text was updated successfully, but these errors were encountered:
Hi @loretoparisi , thank you for your attention to our work!
About num_frames : RIFLEx is a tool for video length extrapolation, which enables video models to generate videos longer than the training length. For CogVideoX, the training length is 49 frames, and with RIFLEx, we allow the video model to generate videos of twice the length (i.e., 97 frames) or even longer.
So in your code, the following code should be deleted as there is no limit on video length:
ifnum_frames>max_frames:
# exceeded max framesnum_frames=max_frameselse:
# too few frames set maxnum_frames=max_frames
And in our code, num_frames is set by defaults to 97, which enables the model to generate videos twice the training length. Certainly you can also adjust num_frames in your way, such as num_frames = num_seconds * fps + 1.
About the assertion of 4 * k + 1: CogVideo uses a casual VAE that encodes (4k+1) pixel frames into (k+1) latent frames, requiring num_frames to be 1 modulo 4. In this assertion, k represents any positive integer, and it doesn't mean args.k, which is the index of intrinsic frequency.
In my CogVideoX pipe I did
Now, in my basic case I always have
finetune=None
(no fined-tuned model).Above the pipe is created in this way
because I have I2V and T2V pipes.
Now in my system I did this calculations
this because I wanted to let the user to pass
num_seconds
and then adapt the value fornum_frames
:While in RIFLEx I see that
num_frames
is set by defaults to 97, why?Also I don't get this assert where by defaults
k=2
:Is this to because you want to ensure that
num_frame
is at leastk
times4
. Why?The text was updated successfully, but these errors were encountered: