name: adding-api-sources description: Use when implementing a new data source adapter for metapyle, before writing any source code
Adding API Sources to Metapyle
Overview
Add new financial data source adapters following TDD and established patterns. Each source provides fetch() and get_metadata() methods with lazy imports for optional dependencies.
Core principle: Use brainstorming skill first for design decisions, then implement following established patterns.
Workflow
- Design - Use
brainstormingskill to decide data model mapping - Plan - Use
writing-plansskill for implementation plan - Implement - Follow TDD with subagents (see Quick Reference)
Design Questions (Brainstorming Phase)
Before coding, answer these questions using the brainstorming skill:
| Question | Why It Matters |
|---|---|
What maps to symbol? | Primary identifier (ticker, bbid, series ID) |
What maps to field? | Secondary identifier if needed (PX_LAST, dataset::column) |
Need params field? | Extra filters (tenor, location, deltaStrike) |
| Authentication model? | External (user calls auth) or internal (credentials passed) |
| Batch strategy? | Single call for all symbols, or group by some key? |
| Column naming? | Symbol only, or symbol::field for uniqueness? |
| Metadata available? | What can get_metadata() return? |
Quick Reference
| Step | Files | Key Actions |
|---|---|---|
| 1. Branch | — | git checkout -b feature/<source>-source |
| 2. Skeleton | sources/<source>.py | Lazy import + class with NotImplementedError |
| 3. Export | sources/__init__.py | Add import + __all__ |
| 4. Tests | tests/unit/test_sources_<source>.py | Mock-based tests (RED) |
| 5. Implement | sources/<source>.py | fetch() then get_metadata() (GREEN) |
| 6. Config | pyproject.toml | Optional dep + mypy ignore |
| 7. Verify | — | pytest, mypy, ruff |
Batch Fetch API
Sources receive batched requests via Sequence[FetchRequest]:
from collections.abc import Sequence
from metapyle.sources.base import BaseSource, FetchRequest, make_column_name, register_source
@register_source("<source>")
class <Source>Source(BaseSource):
def fetch(
self,
requests: Sequence[FetchRequest],
start: str,
end: str,
) -> pd.DataFrame:
"""
Parameters
----------
requests : Sequence[FetchRequest]
Each has: symbol, field (optional), path (optional), params (optional)
start, end : str
ISO dates (YYYY-MM-DD)
Returns
-------
pd.DataFrame
DatetimeIndex, columns named via make_column_name(symbol, field)
"""
if not requests:
return pd.DataFrame()
# ... implementation
FetchRequest Fields
@dataclass(frozen=True, slots=True, kw_only=True)
class FetchRequest:
symbol: str # Required - primary identifier
field: str | None = None # Optional - e.g., "PX_LAST", "dataset::col"
path: str | None = None # Optional - for localfile source
params: dict[str, Any] | None = None # Optional - extra filters
Column Naming
Always use make_column_name() for output columns:
from metapyle.sources.base import make_column_name
# In fetch(), rename columns:
for req in requests:
col_name = make_column_name(req.symbol, req.field) # "AAPL::PX_LAST" or "AAPL"
result[col_name] = data[req.symbol]
Batch Grouping Pattern
When API requires grouping (e.g., by dataset):
def fetch(self, requests: Sequence[FetchRequest], start: str, end: str) -> pd.DataFrame:
# Group by some key (dataset_id, field type, etc.)
groups: dict[str, list[FetchRequest]] = {}
for req in requests:
key = extract_key(req.field) # Your grouping logic
groups.setdefault(key, []).append(req)
# Fetch each group (potentially in parallel)
result_dfs: list[pd.DataFrame] = []
for key, group_requests in groups.items():
symbols = [req.symbol for req in group_requests]
df = api.batch_fetch(key, symbols, start, end)
result_dfs.append(df)
# Merge results
result = result_dfs[0]
for df in result_dfs[1:]:
result = result.join(df, how="outer")
return result
Lazy Import Pattern
_LIB_AVAILABLE: bool | None = None
_lib_modules: dict[str, Any] = {}
def _get_lib() -> dict[str, Any]:
"""Lazy import of library modules."""
global _LIB_AVAILABLE, _lib_modules
if _LIB_AVAILABLE is None:
try:
from library import Module1, Module2
_lib_modules = {"Module1": Module1, "Module2": Module2}
_LIB_AVAILABLE = True
except (ImportError, Exception):
_lib_modules = {}
_LIB_AVAILABLE = False
return _lib_modules
Exception Handling
try:
data = api.fetch(symbols, start, end)
except (FetchError, NoDataError):
raise # Re-raise our exceptions as-is
except Exception as e:
logger.error("fetch_failed: symbols=%s, error=%s", symbols, str(e))
raise FetchError(f"API error: {e}") from e
if data.empty:
raise NoDataError(f"No data returned for {symbols}")
Test Pattern
class TestSourceFetch:
def test_single_request(self) -> None:
with patch("metapyle.sources.<source>._get_lib") as mock_get:
mock_lib = {"API": MagicMock()}
mock_lib["API"].fetch.return_value = mock_data
mock_get.return_value = mock_lib
source = <Source>Source()
requests = [FetchRequest(symbol="SYM", field="FIELD")]
df = source.fetch(requests, "2024-01-01", "2024-12-31")
assert "SYM::FIELD" in df.columns
assert isinstance(df.index, pd.DatetimeIndex)
pyproject.toml
[project.optional-dependencies]
<source> = ["<library>"]
[[tool.mypy.overrides]]
module = ["<library>", "<library>.*"]
ignore_missing_imports = true
Common Mistakes
| Mistake | Fix |
|---|---|
Wrong fetch() signature | Must be fetch(requests: Sequence[FetchRequest], start, end) |
| Import at module level | Use lazy import pattern with _get_lib() |
| Manual column naming | Use make_column_name(symbol, field) |
| f-strings in logging | Use logger.debug("msg: %s", var) |
| Missing empty request check | Return pd.DataFrame() if not requests |
| Catching exceptions silently | Re-raise FetchError/NoDataError, wrap others |
TDD Order
- RED: Write test for
_get_lib()(library not installed) - GREEN: Implement lazy import
- RED: Write test for single request fetch
- GREEN: Implement basic fetch
- RED: Write test for batch fetch
- GREEN: Implement batch handling
- RED: Write error handling tests
- GREEN: Implement error handling
- VERIFY: Run full test suite, ruff, mypy