tot_agent.tools¶
Claude API tool schema definitions and the dispatch() router.
Tool inventory¶
| Tool name | Category | Description |
|---|---|---|
screenshot |
Vision | Capture a PNG of the active page |
navigate |
Navigation | Go to a URL or site-relative path |
click |
Interaction | Click a CSS selector or visible text |
fill |
Interaction | Type a value into an input field |
press_key |
Interaction | Press a keyboard key |
get_page_text |
Inspection | Return visible body text (≤ 4 000 chars) |
get_page_url |
Inspection | Return the current URL |
scroll_down |
Navigation | Scroll to the bottom of the page |
wait_for_element |
Synchronization | Wait for a CSS selector to appear |
switch_user |
Context | Change the active browser session |
login |
Auth | Navigate to login and submit credentials |
fetch_book_covers |
Data | Search for book cover images |
upload_cover_image |
Data | Download a cover image and upload it to a file input |
Dispatch flow¶
flowchart LR
Claude -- "tool_use block" --> dispatch
dispatch -- screenshot --> BrowserManager.screenshot
dispatch -- navigate --> BrowserManager.navigate
dispatch -- click --> BrowserManager.click
dispatch -- fill --> BrowserManager.fill
dispatch -- login --> BrowserManager.navigate & fill & press_key
dispatch -- fetch_book_covers --> CoverFetcher.fetch
dispatch -- upload_cover_image --> download_cover_image & BrowserManager.upload_file
dispatch -- "unknown tool" --> ErrorString["'ERROR: Unknown tool'"]
upload_cover_image flow¶
flowchart TD
A["upload_cover_image(cover_url, selector)"] --> B["download_cover_image(cover_url)"]
B --> C["tmp file on disk"]
C --> D["BrowserManager.upload_file(selector, tmp)"]
D --> E["os.unlink(tmp)"]
E --> F["return result dict"]
The temp file is always deleted in a finally block, even if the upload fails.
Module reference¶
tot_agent.tools
¶
tools.py — Tool schema definitions for the Claude API and the dispatcher that maps tool names to concrete browser/cover actions.
Design notes¶
- The agent uses screenshots + vision to navigate, so tool schemas are kept intentionally generic (no hardcoded selectors).
- Higher-level tools (
login) provide structure; lower-level tools (click,fill,screenshot) allow the agent to recover from unexpected UI states. - :func:
dispatchis the single entry point for the agent loop — it routes a tool name to the appropriate :class:~tot_agent.browser.BrowserManagermethod or cover-fetching function.
Adding a new tool¶
- Append a tool-schema dict to :data:
TOOL_DEFINITIONS. - Add a
case "tool_name":branch in :func:dispatch.
TOOL_DEFINITIONS = [{'name': 'screenshot', 'description': 'Take a screenshot of the current browser page for the active user. Use this to inspect the UI state before deciding what to do next.', 'input_schema': {'type': 'object', 'properties': {}, 'required': []}}, {'name': 'navigate', 'description': "Navigate the active user's browser to a URL (absolute or site-relative path like '/login').", 'input_schema': {'type': 'object', 'properties': {'url': {'type': 'string', 'description': 'URL or relative path to navigate to'}}, 'required': ['url']}}, {'name': 'click', 'description': "Click an element on the page. Pass a CSS selector (e.g. 'button[type=submit]') or visible text (e.g. 'Sign in'). If a CSS selector fails, the agent tries text matching.", 'input_schema': {'type': 'object', 'properties': {'selector': {'type': 'string', 'description': 'CSS selector or visible text to click'}}, 'required': ['selector']}}, {'name': 'fill', 'description': 'Clear and type a value into an input or textarea identified by CSS selector.', 'input_schema': {'type': 'object', 'properties': {'selector': {'type': 'string', 'description': 'CSS selector of the input field'}, 'value': {'type': 'string', 'description': 'Value to type'}}, 'required': ['selector', 'value']}}, {'name': 'press_key', 'description': "Press a keyboard key (e.g. 'Enter', 'Tab', 'Escape') on the active page.", 'input_schema': {'type': 'object', 'properties': {'key': {'type': 'string', 'description': 'Key name to press'}}, 'required': ['key']}}, {'name': 'get_page_text', 'description': 'Return the visible text content of the current page (up to 4000 chars). Useful for reading test listings, error messages, etc.', 'input_schema': {'type': 'object', 'properties': {}, 'required': []}}, {'name': 'get_page_url', 'description': 'Return the current URL of the active browser page.', 'input_schema': {'type': 'object', 'properties': {}, 'required': []}}, {'name': 'scroll_down', 'description': 'Scroll to the bottom of the current page.', 'input_schema': {'type': 'object', 'properties': {}, 'required': []}}, {'name': 'wait_for_element', 'description': 'Wait up to 8 s for a CSS selector to appear — use after form submissions or page transitions.', 'input_schema': {'type': 'object', 'properties': {'selector': {'type': 'string', 'description': 'CSS selector to wait for'}}, 'required': ['selector']}}, {'name': 'switch_user', 'description': "Switch the active browser context to a different simulated user. Each user has an isolated session. You must login after switching to a user that hasn't authenticated yet.", 'input_schema': {'type': 'object', 'properties': {'username': {'type': 'string', 'description': 'Username of the simulated user to switch to'}}, 'required': ['username']}}, {'name': 'login', 'description': 'Log into the site using a username and password. Navigates to the login page, fills credentials, and submits. Take a screenshot afterward to confirm success.', 'input_schema': {'type': 'object', 'properties': {'username': {'type': 'string'}, 'password': {'type': 'string'}}, 'required': ['username', 'password']}}, {'name': 'fetch_book_covers', 'description': 'Search for real book cover images from Open Library / Google Books. Returns a list of covers with title, author, and a direct image URL. Pass the cover_url to upload_cover_image when the form expects a file upload rather than a URL string.', 'input_schema': {'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'Genre, title, or author to search for'}, 'count': {'type': 'integer', 'description': 'Number of covers to return (default 4)', 'default': 4}}, 'required': ['query']}}, {'name': 'upload_cover_image', 'description': "Download a book cover image from a URL and upload it as a file to a <input type='file'> element on the current page. Use this instead of pasting a URL into a text field when the form expects an actual file upload. The cover_url comes from fetch_book_covers.", 'input_schema': {'type': 'object', 'properties': {'cover_url': {'type': 'string', 'description': 'Direct URL of the cover image to download and upload'}, 'selector': {'type': 'string', 'description': "CSS selector of the <input type='file'> element"}}, 'required': ['cover_url', 'selector']}}]
module-attribute
¶
dispatch(tool_name, tool_input, bm)
async
¶
Route a tool call from the agent to the correct implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
Name of the tool as declared in :data: |
required |
tool_input
|
dict[str, Any]
|
Input parameters from the Claude tool-use block. |
required |
bm
|
BrowserManager
|
Active :class: |
required |
Returns:
| Type | Description |
|---|---|
Any
|
A plain Python value ( |
Source code in src/tot_agent/tools.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 | |
format_tool_result(tool_use_id, result)
¶
Package a tool result for the Anthropic messages API.
Screenshots are sent as image content blocks; all other results are
serialised to a text string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_use_id
|
str
|
The |
required |
result
|
Any
|
Raw return value from :func: |
required |
Returns:
| Type | Description |
|---|---|
dict
|
A |