Software Requirements Specification
Project: tot-agent — Autonomous Browser Testing Agent
Version: 1.0
Date: 2026-03-21
Standard: IEEE 830-1998
1. Introduction
1.1 Purpose
This Software Requirements Specification (SRS) defines the functional and non-functional requirements for tot-agent, an autonomous browser agent that uses Claude AI vision and tool-use to execute scripted test scenarios against web applications. The primary audience is developers extending or maintaining the tool and stakeholders evaluating its capabilities.
1.2 Scope
tot-agent is a Python command-line tool that:
- Accepts natural-language goals or pre-built scenario commands
- Drives a Chromium browser via Playwright to execute those goals
- Uses Claude's vision capability to observe the UI state and reason about next actions
- Fetches real book cover images to seed A/B testing platforms
- Supports multi-user simulation with isolated browser sessions
The tool is initially targeted at the This-or-That A/B book-cover testing platform but is designed to be repurposed for any web GUI through configuration.
1.3 Definitions, acronyms, and abbreviations
| Term |
Definition |
| Agent |
The BrowserAgent instance that drives the agentic loop |
| Goal |
A plain-English string describing what the agent should accomplish |
| Observer |
A class implementing AgentObserver that receives agent lifecycle events |
| SIM_USER |
A configured simulated user account (SimUser dataclass) |
| Tool |
A function exposed to Claude via the TOOL_DEFINITIONS schema |
| Context |
A Playwright BrowserContext representing one user's isolated session |
| Cover |
A BookCover instance with title, author, image URL, and source |
1.4 References
1.5 Overview
Section 2 provides the overall product description. Section 3 specifies detailed functional and non-functional requirements. Section 4 includes supporting information.
2. Overall Description
2.1 Product perspective
tot-agent is a standalone command-line tool that integrates three external systems:
graph LR
Tot["tot-agent"]
Claude["Anthropic Claude API"]
Playwright["Playwright / Chromium"]
OL["Open Library API"]
GB["Google Books API"]
App["Target Web Application"]
Tot --> Claude
Tot --> Playwright
Tot --> OL
Tot --> GB
Playwright --> App
It is not a web service, does not have a database, and does not require a network server. All state is in-memory within a single process execution.
2.2 Product functions
The tool provides the following high-level functions:
- Browser automation — navigate, click, fill forms, take screenshots in a real browser
- AI-driven decision making — use Claude's vision to interpret screenshots and choose the next action
- Multi-user simulation — maintain separate browser sessions per simulated user
- Cover image fetching — retrieve real book cover images from Open Library and Google Books
- Scenario execution — execute pre-built or custom natural-language test scenarios
- Reporting — summarise outcomes via console output and/or log files
2.3 User characteristics
| User class |
Technical background |
Primary use |
| Developer / QA engineer |
High; comfortable with Python CLIs |
Running test scenarios during development |
| Platform owner |
Moderate; can run commands |
Seeding demo data for a live platform |
| CI/CD pipeline |
n/a (automated) |
Headless regression testing |
2.4 Constraints
- C-1: Requires an Anthropic API key; incurs API usage costs.
- C-2: Requires a local web server running the target application.
- C-3: Chromium must be installed via
playwright install chromium.
- C-4: Python ≥ 3.12 required (uses
match statements, walrus operator).
- C-5: No cross-browser support in the initial release (Chromium only).
- C-6: The agent loop is limited to
MAX_AGENT_STEPS iterations to prevent runaway API usage.
2.5 Assumptions and dependencies
- A-1: The target application is accessible at a stable URL during agent execution.
- A-2: Simulated user accounts exist in the target application before voting commands are run.
- A-3: The Open Library and Google Books APIs are publicly accessible.
- D-1:
anthropic Python SDK ≥ 0.40.0.
- D-2:
playwright Python package ≥ 1.44.0.
- D-3:
httpx ≥ 0.27.0 for HTTP requests.
3. Specific Requirements
3.1 Functional requirements
3.1.1 Browser automation
| ID |
Requirement |
| FR-BA-01 |
The system SHALL navigate the active browser context to any absolute or site-relative URL. |
| FR-BA-02 |
The system SHALL click elements identified by CSS selector or visible text. |
| FR-BA-03 |
The system SHALL fill input fields identified by CSS selector with a given string value. |
| FR-BA-04 |
The system SHALL press named keyboard keys (e.g. Enter, Tab, Escape). |
| FR-BA-05 |
The system SHALL capture a full-colour PNG screenshot and return it as base64. |
| FR-BA-06 |
The system SHALL return the visible text content of the active page, capped at 4 000 characters. |
| FR-BA-07 |
The system SHALL return the current URL of the active page. |
| FR-BA-08 |
The system SHALL scroll the active page to the bottom. |
| FR-BA-09 |
The system SHALL wait up to a configurable timeout for a CSS selector to appear. |
| FR-BA-10 |
The system SHALL evaluate arbitrary JavaScript in the active page and return the result. |
3.1.2 Multi-user context management
| ID |
Requirement |
| FR-UC-01 |
The system SHALL maintain a pool of named browser contexts, one per simulated user. |
| FR-UC-02 |
The system SHALL lazily create a new context when a previously unseen user key is requested. |
| FR-UC-03 |
Switching user contexts SHALL NOT close existing contexts. |
| FR-UC-04 |
Each context SHALL have isolated cookies and session storage. |
| FR-UC-05 |
The system SHALL close all contexts when the BrowserManager exits. |
3.1.3 Cover image fetching
| ID |
Requirement |
| FR-CF-01 |
The system SHALL search Open Library for book covers matching a query string. |
| FR-CF-02 |
The system SHALL fall back to Google Books when Open Library returns fewer results than requested. |
| FR-CF-03 |
Returned cover URLs SHALL use HTTPS. |
| FR-CF-04 |
Results SHALL be deduplicated by normalised title. |
| FR-CF-05 |
The system SHALL verify that a cover URL resolves via HTTP HEAD on request. |
| FR-CF-06 |
The system SHALL support random cover-pair generation for seeding A/B tests. |
3.1.4 Agentic loop
| ID |
Requirement |
| FR-AL-01 |
The agent SHALL accept a plain-English goal string. |
| FR-AL-02 |
The agent SHALL pass the goal, message history, and tool schemas to Claude. |
| FR-AL-03 |
The agent SHALL execute all tool calls returned by Claude in sequence. |
| FR-AL-04 |
The agent SHALL feed tool results back to Claude as tool_result messages. |
| FR-AL-05 |
The agent SHALL terminate when Claude returns stop_reason == "end_turn". |
| FR-AL-06 |
The agent SHALL terminate after MAX_AGENT_STEPS iterations if end_turn is not reached. |
| FR-AL-07 |
The agent SHALL return a plain-text summary of what was accomplished. |
3.1.5 Observer / event reporting
| ID |
Requirement |
| FR-OB-01 |
The agent SHALL emit an AgentEvent for each of: GOAL_START, STEP_START, AGENT_TEXT, TOOL_CALL, TOOL_RESULT, GOAL_COMPLETE, STEP_LIMIT. |
| FR-OB-02 |
Any number of AgentObserver instances SHALL be registerable at construction time. |
| FR-OB-03 |
Observers SHALL be addable and removable at runtime. |
| FR-OB-04 |
ConsoleObserver SHALL render events as Rich-formatted terminal output. |
| FR-OB-05 |
LoggingObserver SHALL write events to the Python logging hierarchy. |
3.1.6 Command-line interface
| ID |
Requirement |
| FR-CLI-01 |
The CLI SHALL provide sub-commands: create, vote, simulate, seed, goal, users, info, covers. |
| FR-CLI-02 |
The CLI SHALL support global options: --log-level, --log-file, --model, --max-steps, --site-url. |
| FR-CLI-03 |
The CLI SHALL print the version via --version. |
| FR-CLI-04 |
The info command SHALL display all active configuration values. |
| FR-CLI-05 |
The covers command SHALL work without launching a browser. |
| FR-CLI-06 |
All sub-commands that launch a browser SHALL support --headless. |
3.2 Non-functional requirements
| ID |
Requirement |
| NFR-P-01 |
Context switching SHALL complete in < 500 ms for already-created contexts. |
| NFR-P-02 |
Screenshots SHALL be captured within the Playwright page timeout (default 15 s). |
| NFR-P-03 |
Cover search requests SHALL complete within 10 s per source. |
3.2.2 Reliability
| ID |
Requirement |
| NFR-R-01 |
HTTP errors from cover APIs SHALL be caught; the system SHALL return an empty list rather than raising. |
| NFR-R-02 |
Browser action failures SHALL return an "ERROR: ..." string rather than raising an exception. |
| NFR-R-03 |
The agent loop SHALL NOT crash on unknown tool names; it SHALL return an error string to Claude. |
3.2.3 Maintainability
| ID |
Requirement |
| NFR-M-01 |
All public classes and functions SHALL have Sphinx-compatible docstrings. |
| NFR-M-02 |
Code SHALL conform to PEP 8 style. |
| NFR-M-03 |
Test coverage SHALL be tracked via pytest-cov; the coverage report SHALL be generated on each test run. |
| NFR-M-04 |
New cover sources SHALL be addable without modifying CoverFetcher. |
3.2.4 Security
| ID |
Requirement |
| NFR-S-01 |
The Anthropic API key SHALL be loaded from environment variables or .env; it SHALL NOT be hardcoded. |
| NFR-S-02 |
Browser contexts SHALL NOT share cookies or session storage across users. |
| NFR-S-03 |
The --log-file output SHALL NOT include the API key in plaintext. |
3.2.5 Portability
| ID |
Requirement |
| NFR-PT-01 |
The tool SHALL run on macOS, Linux, and Windows. |
| NFR-PT-02 |
All dependencies SHALL be declared in pyproject.toml. |
| NFR-PT-03 |
The tool SHALL be installable in a Python virtual environment via pip install -e .. |
4.1 Requirements traceability matrix
| Requirement |
Source module |
Test file |
| FR-BA-01 – FR-BA-10 |
browser.py |
tests/unit/test_browser.py |
| FR-UC-01 – FR-UC-05 |
browser.py |
tests/unit/test_browser.py |
| FR-CF-01 – FR-CF-06 |
covers.py |
tests/unit/test_covers.py |
| FR-AL-01 – FR-AL-07 |
agent.py |
tests/unit/test_agent.py |
| FR-OB-01 – FR-OB-05 |
agent.py |
tests/unit/test_agent.py |
| FR-CLI-01 – FR-CLI-06 |
cli.py |
(CLI tests — future) |
4.2 Out of scope for v1.0
- Cross-browser support (Firefox, WebKit)
- Parallel multi-user execution
- Persistent test result storage
- GUI dashboard or web API
- Authentication methods other than username/password form fill