You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Mind2Web benchmark, the type action needs a target location argument. How to enable Aguvis to output a browser.write action with such a location argument (x=..., y=...)?
I use a prompt like:
'Please generate the next move according to the UI screenshot, instruction and previous actions.\nInstruction: What are the romantic reggae musics from BCD Studio that can be used in tik tok series in andorra\nPrevious actions: Step 1. click on [label]. Step 2. click on [div] Andorra. Step 3. click on [span] TikTok Series. Step 4. click on [span] Reggae. Step 5. click on [span] Romantic.'
Aguvis would output a plan without specifying the target location:
"Action: Input 'BCD Studio' in the search bar to find the desired romantic reggae musics.\n\nassistantall\nThought: To find the specific romantic reggae musics from BCD Studio, I should use the search function with the given keyword, 'BCD Studio', to narrow down the options.\nAction: Type 'BCD Studio' in the search bar to find the specific romantic reggae musics.\n\nassistantos\npyautogui.write(message='BCD Studio')"
Hi thanks for the question! Since in the definition of aguvis, the action space other than pyautogui (click, write, scroll, etc.) need to be spefically mentioned in he system prompt, so you may need to do that. More details of data format you can check the training data of mind2web on huggingface.
Thank you for open-sourcing the work.
In Mind2Web benchmark, the
type
action needs a target location argument. How to enable Aguvis to output abrowser.write
action with such a location argument (x=..., y=...)?I use a prompt like:
'Please generate the next move according to the UI screenshot, instruction and previous actions.\nInstruction: What are the romantic reggae musics from BCD Studio that can be used in tik tok series in andorra\nPrevious actions: Step 1. click on [label]. Step 2. click on [div] Andorra. Step 3. click on [span] TikTok Series. Step 4. click on [span] Reggae. Step 5. click on [span] Romantic.'
Aguvis would output a plan without specifying the target location:
"Action: Input 'BCD Studio' in the search bar to find the desired romantic reggae musics.\n\nassistantall\nThought: To find the specific romantic reggae musics from BCD Studio, I should use the search function with the given keyword, 'BCD Studio', to narrow down the options.\nAction: Type 'BCD Studio' in the search bar to find the specific romantic reggae musics.\n\nassistantos\npyautogui.write(message='BCD Studio')"
Could the authors provide some hints? Thanks! @Timothyxxx @ranpox
The text was updated successfully, but these errors were encountered: