Yosoi Repository Agent Guide
Context & Philosophy
Yosoi is an AI-powered tool that discovers resilient selectors for web scraping. The core philosophy is "Discover once, scrape forever." We use LLMs to analyze HTML structure and find selectors that are robust to layout changes, then validate them to ensure accuracy.
Fail Fast: We do not use fallback heuristics. If AI discovery fails, we fail the process. This ensures we don't return garbage data from unreliable selectors.
Technology Stack & Standards
- Language: Python 3.10+
- Package Manager:
uv(Strict requirement. DO NOT use pip/poetry directly). - Linting/Formatting:
ruff - Testing:
pytest - Type Checking:
mypy - Retry Logic:
tenacity(Mandatory for all flaky/network operations)
Retry Logic & Durability
We use tenacity to handle retries.
- DO NOT use
time.sleep()in loops. - DO use granular
Retryingcontext managers or decorators. - DO use
wait_exponentialto avoid thundering herds.
Example
from tenacity import Retrying, stop_after_attempt, wait_exponential, retry_if_exception_type
# Preferred pattern: Context Manager
for attempt in Retrying(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, max=10),
retry=retry_if_exception_type(NetworkError),
reraise=True
):
with attempt:
make_network_call()
Critical Rules
- Dependency Management: ALWAYS use
uv addoruv sync. never install with pip. - Running Code: ALWAYS use
uv run <command>.- Example:
uv run yosoi --url ... - Example:
uv run pytest
- Example:
- Code Style: Run
uv run ruff check .anduv run ruff format .before finishing a task. - Type Safety: Maintain strong typing. Use
mypyto verify. - Retry Logic: Use
tenacityfor all retry patterns. Never implement custom retry loops withfororwhile.
Repository Structure
yosoi/: The core python package.tests/: Integration and unit tests.examples/: Usage examples..yosoi/: Local storage for selectors, debug HTML, and logs (gitignored).logs/: Contains run logs inrun_YYYYMMDD_HHMMSS.logformat.debug_html/: Extracted HTML for debugging.
Logging & Observability
- Local Logs: Every run generates a log file in
.yosoi/logs/. These logs contain detailed debug information and full tracebacks. - Logfire: Used for cloud-based observability if
LOGFIRE_TOKENis set. - Console: Keeping it minimal and stylish for human eyes.
Interaction Guidelines
When working on this repo, generic python solutions often fail. Always check pyproject.toml for available scripts and configuration.