-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New SLU task #4569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New SLU task #4569
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to go~ Many thanks for all the implementations.
Can you give me a day? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some high-level comments.
If it is difficult to deal with it, we can leave it.
>>> speech2understand = Speech2Understand("slu_config.yml", "slu.pth") | ||
>>> audio, rate = soundfile.read("speech.wav") | ||
>>> speech2understand(audio) | ||
[(text, token, token_int, hypothesis object), ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output looks complicated as an API.
Are all of them necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make it exactly same as Speech2Text
espnet/espnet2/bin/asr_inference.py
Line 47 in f274ebe
[(text, token, token_int, hypothesis object), ...] |
from espnet.utils.cli_utils import get_commandline_args | ||
|
||
|
||
class Speech2Understand: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this class simple if we inherit a class from Speech2Text in ASR?
This is just an idea.
If it complicates your class, we do not have to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion! I will take a stab at this and try to simplify espnet2/bin/slu_inference.py, espnet2/slu/espnet_model.py and espnet2/tasks/slu.py using the corresponding ASR files.
espnet2/slu/espnet_model.py
Outdated
yield | ||
|
||
|
||
class ESPnetSLUModel(AbsESPnetModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
Can we inherit it from ASRESPnetModel?
espnet2/tasks/slu.py
Outdated
) | ||
|
||
|
||
class SLUTask(AbsTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditt.
Can we use ASRTask?
Hi @sw005320, I have used ASRTask in espnet2/tasks/slu.py and ASRESPnetModel in espnet2/slu/espnet_model.py, which have helped avoid some code replication. espnet2/bin/slu_inference.py was not much simplified using Speech2Text class, so I have kept it as it is. I believe PR is ready to be merged (as all CI checks have passed except 1 that was cancelled). |
Cool! @ftshijt, can we go through the review again (I'll also do it)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a minor comment to improve the modularization.
espnet2/train/preprocessor.py
Outdated
@@ -169,6 +170,20 @@ def __init__( | |||
token_list=token_list, | |||
unk_symbol=unk_symbol, | |||
) | |||
if transcript_token_list is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be task-specific. How about we just make it a new class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have made a new preprocessor class using the CommonPreprocessor class as the base class.
The design looks very good to me. |
espnet2/train/preprocessor.py
Outdated
@@ -352,6 +331,88 @@ def __call__( | |||
return data | |||
|
|||
|
|||
class CommonPreprocessor_SLU(CommonPreprocessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can rename it to SLUPreprocessor
?
I think we do not have to increase the coverage, but if you miss some essential functions for the test, please add them. If the current test is good enough, please let me know. Then, I'll merge this PR. |
Yeah I agree. I think the current test is good enough and PR is ready to be merged. |
I just merged the PR! |
No description provided.