-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add stage 5 & stage 6 #4649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stage 5 & stage 6 #4649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks! A few comments
@@ -0,0 +1,45 @@ | |||
# Copyright 2021 Tomoki Hayashi | |||
# Copyright 2021 Carnegie Mellon University (Jiatong Shi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try to import contributors of Muskits to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar can be applied to other files (but do not have to be now)
|
||
( | ||
labelFrame, | ||
labelFrame_lengths, | ||
scoreFrame, | ||
scoreFrame_lengths, | ||
tempoFrame, | ||
tempoFrame_lengths, | ||
) = extractMethod_frame( | ||
durations=durations.unsqueeze(-1), | ||
durations_lengths=durations_lengths, | ||
score=score.unsqueeze(-1), | ||
score_lengths=score_lengths, | ||
tempo=tempo.unsqueeze(-1), | ||
tempo_lengths=tempo_lengths, | ||
) | ||
|
||
labelFrame = labelFrame[ | ||
:, : labelFrame_lengths.max() | ||
] # for data-parallel | ||
scoreFrame = scoreFrame[ | ||
:, : scoreFrame_lengths.max() | ||
] # for data-parallel | ||
|
||
# Extract Syllable Level label, score, tempo information from Frame Level | ||
( | ||
label, | ||
label_lengths, | ||
score, | ||
score_lengths, | ||
tempo, | ||
tempo_lengths, | ||
) = self.score_feats_extract( | ||
durations=labelFrame, | ||
durations_lengths=labelFrame_lengths, | ||
score=scoreFrame, | ||
score_lengths=scoreFrame_lengths, | ||
tempo=tempoFrame, | ||
tempo_lengths=tempoFrame_lengths, | ||
) | ||
|
||
# calculate durations, represent syllable encoder outputs to feats mapping | ||
# Syllable Level duration info needs phone & midi | ||
ds = [] | ||
for i, _ in enumerate(labelFrame_lengths): | ||
assert labelFrame_lengths[i] == scoreFrame_lengths[i] | ||
assert label_lengths[i] == score_lengths[i] | ||
|
||
frame_length = labelFrame_lengths[i] | ||
_phoneFrame = labelFrame[i, :frame_length] | ||
_midiFrame = scoreFrame[i, :frame_length] | ||
|
||
# Clean _phoneFrame & _midiFrame | ||
for index in range(frame_length): | ||
if _phoneFrame[index] == 0 and _midiFrame[index] == 0: | ||
frame_length -= 1 | ||
feats_lengths[i] -= 1 | ||
|
||
syllable_length = label_lengths[i] | ||
_phoneSyllable = label[i, :syllable_length] | ||
_midiSyllable = score[i, :syllable_length] | ||
|
||
start_index = 0 | ||
ds_tmp = [] | ||
flag_finish = 0 | ||
for index in range(syllable_length): | ||
_findPhone = _phoneSyllable[index] | ||
_findMidi = _midiSyllable[index] | ||
_length = 0 | ||
if flag_finish == 1: | ||
# Fix error in _phoneSyllable & _midiSyllable | ||
label[i, index] = 0 | ||
score[i, index] = 0 | ||
tempo[i, index] = 0 | ||
label_lengths[i] -= 1 | ||
score_lengths[i] -= 1 | ||
tempo_lengths[i] -= 1 | ||
else: | ||
for indexFrame in range(start_index, frame_length): | ||
if ( | ||
_phoneFrame[indexFrame] == _findPhone | ||
and _midiFrame[indexFrame] == _findMidi | ||
): | ||
_length += 1 | ||
else: | ||
ds_tmp.append(_length) | ||
start_index = indexFrame | ||
break | ||
if indexFrame == frame_length - 1: | ||
flag_finish = 1 | ||
ds_tmp.append(_length) | ||
start_index = indexFrame | ||
break | ||
|
||
assert ( | ||
sum(ds_tmp) == frame_length and sum(ds_tmp) == feats_lengths[i] | ||
) | ||
|
||
ds.append(torch.tensor(ds_tmp)) | ||
ds = pad_list(ds, pad_value=0).to(label.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part needs to be changed accordingly towards XML feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we make them to be a specific module (apart from tts model)
if tempo is not None: | ||
tempo = tempo.to(dtype=torch.long) | ||
batch.update(tempo=tempo, tempo_lengths=tempo_lengths) | ||
if ds is not None: | ||
batch.update(ds=ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have got a lot of issues with the naming in the previous repo. Please consider renaming it for better interpretability
Co-authored-by: Jiatong <728307998@qq.com>
Thanks for the update. I will first merge it |
Hi @ftshijt, here are new updates on Muskit:
Recipe naive_rnn without DP on Ofuton can be runned now (stage 1 ~ 5 is tested, stage 6 is still running, others need to be done)
Some details need to be checked in NAR model with DP, which hasn't been applied in this version.
Recipe ofuton naive_rnn has been tested from stage 1 to stage 9.