Feedback: Add prompting suggestions + strong descriptions so users can have success with the server #37

govindrai · 2025-04-18T16:59:20Z

Device/Package Details

Latest Mobile-MCP commit as of 4/18/2025 8083235 + Cursor IDE + MacOSX 15.1.1.

Thanks for writing up this mcp server. It takes a lot of hard work. I've been messing around with your project, and I wanted to share my experience. I know the project is in very early stages, and I am just trying to understand how to best/efficiently use the software.

Feedback

I have been having a hard time getting the AI models to call the right tools with the right arguments.

I think something that can help is improving tool descriptions (maybe even adding one shot or multi shot examples in the tool description) + providing some tried and true prompting examples for each tool for that users can send to their LLM.

For example, say I want to take screenshots of each tab in a mobile app. Some of those tabs are scrollable. LLM will always provide direction "up" instead of "down". However to scroll up, the direction needs to be "down". This makes it so the server never scrolls down successfully until you explicitly tell the llm to send the down scroll direction.

Currently I feel that users will need to specify tool names and provide exact arguments (ideally not since natural language is preferred here), but maybe that is necessary if the model can't call the right tools for whatever reason.

What I wanted to do: Get cursor to get mobile-mcp to get past a passcode screen (shown below). It kept trying to tap the keyboard in when it could have easily typed it (but it chose to tap). But then for whatever reason the tapping never worked. Not sure if this is a bug or tool calling issue.

My LLM conversation

Affirm App Screenshot Instructions

User:

open the affirm app on the iphone 16 simulator and take a screenshot of the home tab. if you hit the passcode screen type in 1111 as the passcode.

Cursor:

I'll help you open the Affirm app on the iPhone 16 simulator and take a screenshot of the home tab.

Cursor:

I see that we need to enter the passcode 1111. I'll tap the number 1 four times.

Cursor:

It looks like the passcode wasn't accepted. Let me try entering 1111 again with a slight delay between each tap to ensure they register properly.

User:

please explain why you are clicking on coordinates instead of typing?

Cursor:

I'm clicking on coordinates instead of typing because the passcode screen shows a numeric keypad rather than a text input field. The mobile interaction tools available to me work differently based on the type of input needed:

For standard text fields with a keyboard, I would use the type_keys function
For PIN/passcode screens with numeric keypads, I need to physically tap the number buttons by clicking at their coordinates

The passcode screen presents a specialized keypad interface where each button needs to be tapped individually. I identified the coordinates of the "1" button (x:100, y:884) and am tapping it four times to enter "1111".

User:

how do you know it's a specialized keyboard? seems like a normal ios numberpad keyboard tome

Cursor:

You're right - it is a standard iOS numeric keypad. I should have been more precise in my explanation.

I'm clicking on coordinates because this passcode screen requires tapping the numeric buttons directly rather than using the type_keys function. Even though it's a standard iOS keypad, passcode entry screens typically require direct interaction with the on-screen buttons rather than accepting text input via typing functions.

User:

what does the type_keys tool do?

Cursor:

The type_keys tool types text into a focused input element. It has two parameters:

text: The string of text you want to type
submit: Boolean that determines whether to submit the text (like pressing Enter)

It's designed for typing into text fields that accept keyboard input. For the passcode screen, I could have tried using type_keys with "1111", but I chose to click the number buttons directly since the passcode screen shows a numeric keypad interface.

User:

just type it instead of tapping

Cursor:

Perfect! We've successfully accessed the Affirm app and we're looking at the home tab. The screenshot shows:

Affirm logo at the top
Search bar to find stores that accept Affirm payments
Featured merchants including Neiman Marcus, Harley-Davidson, Jomashop, and others
Promotional banner for 0% APR on first Affirm purchase with Apple Pay
Bottom navigation with Home (currently selected), Deals, Card, Manage, and Money tabs

This is the home tab of the Affirm app as requested.

The text was updated successfully, but these errors were encountered:

krismuhi · 2025-04-19T07:54:09Z

great feedback! thanks for taking the time for this awesome writeup.
we are reviewing it and gonna share updates soon

gmegidish · 2025-05-04T13:30:22Z

@govindrai that's amazing feedback, that helps a lot!. it's interesting to see if we can put some context into the tools, like, when you "use device", it will return a long response to the LLM, telling it that if it needs to write some text, better try with the keyboard first and only use "tap" if it doesn't work.

Right now, it seems that you can engineer the perfect prompt for "use keyboard to enter the password 1234". We added "focused" (but only for android atm) for elements, maybe LLM will be smart enough to see that you're now focused on a text view and maybe type using keys. Or maybe we need to expose "is keyboard open" which will hint that you can use keys instead of tapping.

Thank you so much for the report. May I close the ticket?

govindrai · 2025-05-04T14:06:14Z

Appreciate you guys taking the time to reply and positively accepting the feedback!! Yes, please close!

…

On Sun, May 4, 2025 at 6:30 AM Gil Megidish ***@***.***> wrote: *gmegidish* left a comment (mobile-next/mobile-mcp#37) <#37 (comment)> @govindrai <https://github.com/govindrai> that's amazing feedback, that helps a lot!. it's interesting to see if we can put some context into the tools, like, when you "use device", it will return a long response to the LLM, telling it that if it needs to write some text, better try with the keyboard first and only use "tap" if it doesn't work. Right now, it seems that you can engineer the perfect prompt for "use keyboard to enter the password 1234". We added "focused" (but only for android atm) for elements, maybe LLM will be smart enough to see that you're now focused on a text view and maybe type using keys. Or maybe we need to expose "is keyboard open" which will hint that you can use keys instead of tapping. Thank you so much for the report. May I close the ticket? — Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADJXTMK3VTZQTLBR47WIWDD24YJAJAVCNFSM6AAAAAB3NLVRR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBZGIZDCMZZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gmegidish · 2025-05-04T14:25:40Z

Thank you so much @govindrai !!! 💯

< 63EB /task-lists>

govindrai changed the title ~~Recommend adding prompting suggestions so users can have success~~ Feedback: Add prompting suggestions + strong descriptions so users can have success with the se 8000 rver Apr 18, 2025

gmegidish closed this as completed May 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feedback: Add prompting suggestions + strong descriptions so users can have success with the server #37

Feedback: Add prompting suggestions + strong descriptions so users can have success with the server #37

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feedback: Add prompting suggestions + strong descriptions so users can have success with the server #37

Feedback: Add prompting suggestions + strong descriptions so users can have success with the server #37

Comments

Uh oh!

Device/Package Details

Feedback

What I wanted to do: Get cursor to get mobile-mcp to get past a passcode screen (shown below). It kept trying to tap the keyboard in when it could have easily typed it (but it chose to tap). But then for whatever reason the tapping never worked. Not sure if this is a bug or tool calling issue.

My LLM conversation

Affirm App Screenshot Instructions

Uh oh!

Uh oh!

Uh oh!

Uh oh!