10000 How to handle the Mind2Web action space? · Issue #19 · xlang-ai/aguvis · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

How to handle the Mind2Web action space? #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZJULiHongxin opened this issue Feb 6, 2025 · 1 comment
Open

How to handle the Mind2Web action space? #19

ZJULiHongxin opened this issue Feb 6, 2025 · 1 comment

Comments

@ZJULiHongxin
Copy link

Thank you for open-sourcing the work.

In Mind2Web benchmark, the type action needs a target location argument. How to enable Aguvis to output a browser.write action with such a location argument (x=..., y=...)?

I use a prompt like:

'Please generate the next move according to the UI screenshot, instruction and previous actions.\nInstruction: What are the romantic reggae musics from BCD Studio that can be used in tik tok series in andorra\nPrevious actions: Step 1. click on [label]. Step 2. click on [div] Andorra. Step 3. click on [span] TikTok Series. Step 4. click on [span] Reggae. Step 5. click on [span] Romantic.'

Aguvis would output a plan without specifying the target location:

"Action: Input 'BCD Studio' in the search bar to find the desired romantic reggae musics.\n\nassistantall\nThought: To find the specific romantic reggae musics from BCD Studio, I should use the search function with the given keyword, 'BCD Studio', to narrow down the options.\nAction: Type 'BCD Studio' in the search bar to find the specific romantic reggae musics.\n\nassistantos\npyautogui.write(message='BCD Studio')"

Could the authors provide some hints? Thanks! @Timothyxxx @ranpox

@Timothyxxx
Copy link
Member

Hi thanks for the question! Since in the definition of aguvis, the action space other than pyautogui (click, write, scroll, etc.) need to be spefically mentioned in he system prompt, so you may need to do that. More details of data format you can check the training data of mind2web on huggingface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0