Mobile Automation with agent-device
For exploration, use snapshot refs. For deterministic replay, use selectors. For structured exploratory QA bug hunts and reporting, use ../dogfood/SKILL.md.
Start Here (Read This First)
Use this skill as a router, not a full manual.
- Pick one mode:
- Normal interaction flow
- Debug/crash flow
- Replay maintenance flow
- Run one canonical flow below.
- Open references only if blocked.
Decision Map
- No target context yet:
devices-> pick target ->open. - Normal UI task:
open->snapshot -i->press/fill->diff snapshot -i->close - Debug/crash:
open <app>->logs clear --restart-> reproduce ->network dump->logs path-> targetedgrep - Replay drift:
replay -u <path>-> verify updated selectors - Remote multi-tenant run: allocate lease -> run commands with tenant isolation flags -> heartbeat/release lease
- Device-scope isolation run: set iOS simulator set / Android allowlist -> run selectors within scope only
Canonical Flows
1) Normal Interaction Flow
bash1agent-device open Settings --platform ios 2agent-device snapshot -i 3agent-device press @e3 4agent-device diff snapshot -i 5agent-device fill @e5 "test" 6agent-device close
2) Debug/Crash Flow
bash1agent-device open MyApp --platform ios 2agent-device logs clear --restart 3agent-device network dump 25 4agent-device logs path
Logging is off by default. Enable only for debugging windows.
logs clear --restart requires an active app session (open <app> first).
3) Replay Maintenance Flow
bash1agent-device replay -u ./session.ad
4) Remote Tenant Lease Flow (HTTP JSON-RPC)
bash1# Allocate lease 2curl -sS http://127.0.0.1:${AGENT_DEVICE_DAEMON_HTTP_PORT}/rpc \ 3 -H "content-type: application/json" \ 4 -H "Authorization: Bearer <token>" \ 5 -d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"runId":"run-123","tenantId":"acme","ttlMs":60000}}' 6 7# Use lease in tenant-isolated command execution 8agent-device --daemon-transport http \ 9 --tenant acme \ 10 --session-isolation tenant \ 11 --run-id run-123 \ 12 --lease-id <lease-id> \ 13 session list --json 14 15# Heartbeat and release 16curl -sS http://127.0.0.1:${AGENT_DEVICE_DAEMON_HTTP_PORT}/rpc \ 17 -H "content-type: application/json" \ 18 -H "Authorization: Bearer <token>" \ 19 -d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}' 20curl -sS http://127.0.0.1:${AGENT_DEVICE_DAEMON_HTTP_PORT}/rpc \ 21 -H "content-type: application/json" \ 22 -H "Authorization: Bearer <token>" \ 23 -d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'
Command Skeleton (Minimal)
Session and navigation
bash1agent-device devices 2agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators 3agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234 4agent-device ensure-simulator --device "iPhone 16" --ios-simulator-device-set /tmp/tenant-a/simulators 5agent-device ensure-simulator --device "iPhone 16" --runtime com.apple.CoreSimulator.SimRuntime.iOS-18-4 --ios-simulator-device-set /tmp/tenant-a/simulators --boot 6agent-device open [app|url] [url] 7agent-device open [app] --relaunch 8agent-device close [app] 9agent-device install <app> <path-to-binary> 10agent-device reinstall <app> <path-to-binary> 11agent-device session list
Use boot only as fallback when open cannot find/connect to a ready target.
For Android emulators by AVD name, use boot --platform android --device <avd-name>.
For Android emulators without GUI, add --headless.
Use --target mobile|tv with --platform (required) to pick phone/tablet vs TV targets (AndroidTV/tvOS).
Isolation scoping quick reference:
--ios-simulator-device-set <path>scopes iOS simulator discovery + command execution to one simulator set.--android-device-allowlist <serials>scopes Android discovery/selection to comma/space separated serials.- Scope is applied before selectors (
--device,--udid,--serial); out-of-scope selectors fail withDEVICE_NOT_FOUND. - With iOS simulator-set scope enabled, iOS physical devices are not enumerated.
Simulator provisioning quick reference:
- Use
ensure-simulatorto create or reuse a named iOS simulator inside a device set before starting a session. --device <name>is required (e.g."iPhone 16 Pro").--runtime <id>pins the runtime; omit to use the newest compatible one.--bootboots it immediately. Returnsudid,device,runtime,ios_simulator_device_set,created,booted.- Idempotent: safe to call repeatedly; reuses an existing matching simulator by default.
TV quick reference:
- AndroidTV:
open/appsuse TV launcher discovery automatically. - TV target selection works on emulators/simulators and connected physical devices (AndroidTV + AppleTV).
- tvOS: runner-driven interactions and snapshots are supported (
snapshot,wait,press,fill,get,scroll,back,home,app-switcher,recordand related selector flows). - tvOS
back/home/app-switchermap to Siri Remote actions (menu,home, double-home) in the runner. - tvOS follows iOS simulator-only command semantics for helpers like
pinch,settings, andpush.
Snapshot and targeting
bash1agent-device snapshot -i 2agent-device diff snapshot -i 3agent-device find "Sign In" click 4agent-device press @e1 5agent-device fill @e2 "text" 6agent-device is visible 'id="anchor"'
press is canonical tap command; click is an alias.
Utilities
bash1agent-device appstate 2agent-device clipboard read 3agent-device clipboard write "token" 4agent-device keyboard status 5agent-device keyboard dismiss 6agent-device perf --json 7agent-device network dump [limit] [summary|headers|body|all] 8agent-device push <bundle|package> <payload.json|inline-json> 9agent-device trigger-app-event screenshot_taken '{"source":"qa"}' 10agent-device get text @e1 11agent-device screenshot out.png 12agent-device settings permission grant notifications 13agent-device settings permission reset camera 14agent-device trace start 15agent-device trace stop ./trace.log
Batch (when sequence is already known)
bash1agent-device batch --steps-file /tmp/batch-steps.json --json
Performance Check
- Use
agent-device perf --json(ormetrics --json) afteropen. - For detailed metric semantics, caveats, and interpretation guidance, see references/perf-metrics.md.
Guardrails (High Value Only)
- Re-snapshot after UI mutations (navigation/modal/list changes).
- Prefer
snapshot -i; scope/depth only when needed. - Use refs for discovery, selectors for replay/assertions.
find "<query>" click --jsonreturns{ ref, locator, query, x, y }— all derived from the matched snapshot node. Do not rely on these fields from rawpress/clickresponses for observability; usefindinstead.- Use
fillfor clear-then-type semantics; usetypefor focused append typing. - Use
installfor in-place app upgrades (keep app data when platform permits), andreinstallfor deterministic fresh-state runs. - App binary format support for
install/reinstall: Android.apk/.aab, iOS.app/.ipa. - Android
.aabrequiresbundletoolinPATH, orAGENT_DEVICE_BUNDLETOOL_JAR=<path-to-bundletool-all.jar>withjavainPATH. - Android
.aaboptional: setAGENT_DEVICE_ANDROID_BUNDLETOOL_MODE=<mode>to control bundletoolbuild-apks --mode(default:universal). - iOS
.ipa: extract/install fromPayload/*.app; when multiple app bundles are present,<app>is used as a bundle id/name hint. - iOS
appstateis session-scoped; Androidappstateis live foreground state. iOS responses includedevice_udidandios_simulator_device_setfor isolation verification. - iOS
openresponses includedevice_udidandios_simulator_device_setto confirm which simulator handled the session. - Clipboard helpers:
clipboard read/clipboard write <text>are supported on Android and iOS simulators; iOS physical devices are not supported yet. - Android keyboard helpers:
keyboard status|get|dismissreport keyboard visibility/type and dismiss via keyevent when visible. network dumpis best-effort and parses HTTP(s) entries from the session app log file.- Biometric settings: iOS simulator supports
settings faceid|touchid <match|nonmatch|enroll|unenroll>; Android supportssettings fingerprint <match|nonmatch>where runtime tooling is available. - For AndroidTV/tvOS selection, always pair
--targetwith--platform(ios,android, orapplealias); target-only selection is invalid. pushsimulates notification delivery:- iOS simulator uses APNs-style payload JSON.
- Android uses broadcast action + typed extras (string/boolean/number).
trigger-app-eventrequires app-defined deep-link hooks and URL template configuration (AGENT_DEVICE_APP_EVENT_URL_TEMPLATEor platform-specific variants).trigger-app-eventrequires an active session or explicit selectors (--platform,--device,--udid,--serial); on iOS physical devices, custom-scheme triggers require active app context.- Canonical trigger behavior and caveats are documented in
website/docs/docs/commands.mdunder App event triggers. - Permission settings are app-scoped and require an active session app:
settings permission <grant|deny|reset> <camera|microphone|photos|contacts|notifications> [full|limited] - iOS simulator permission alerts: use
alert waitthenalert accept/dismiss—accept/dismissretry internally for up to 2 s so you do not need manual sleeps. See references/permissions.md. full|limitedmode applies only to iOSphotos; other targets reject mode.- On Android, non-ASCII
fill/typemay require an ADB keyboard IME on some system images; only install IME APKs from trusted sources and verify checksum/signature. - If using
--save-script, prefer explicit path syntax (--save-script=flow.ador./flow.ad). - For tenant-isolated remote runs, always pass
--tenant,--session-isolation tenant,--run-id, and--lease-idtogether. - Use short lease TTLs and heartbeat only while work is active; release leases immediately after run completion/failure.
- Env equivalents for scoped runs:
AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET(compatIOS_SIMULATOR_DEVICE_SET) andAGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST(compatANDROID_DEVICE_ALLOWLIST).
Security and Trust Notes
- Prefer a preinstalled
agent-devicebinary over on-demand package execution. - If install is required, pin an exact version (for example:
npx --yes agent-device@<exact-version> --help). - Signing/provisioning environment variables are optional, sensitive, and only for iOS physical-device setup.
- Logs/artifacts are written under
~/.agent-device; replay scripts write to explicit paths you provide. - For remote daemon mode, prefer
AGENT_DEVICE_DAEMON_SERVER_MODE=http|dualwithAGENT_DEVICE_HTTP_AUTH_HOOKand tenant-scoped lease admission. - Keep logging off unless debugging and use least-privilege/isolated environments for autonomous runs.
Common Mistakes
- Mixing debug flow into normal runs (keep logs off unless debugging).
- Continuing to use stale refs after screen transitions.
- Using URL opens with Android
--activity(unsupported combination). - Treating
bootas default first step instead of fallback.