Reproducibility Rules for Final Project (Agentic Coding Guide)
Project: MacroEconomic Uncertainty database replication (Comunale & Nguyen 2025)
Environment: Pixi + pytask + Python 3.14
Critical Rule: Everything must be reproducible from main branch with zero uncommitted changes
1. CARDINAL RULES (Never Break These)
1.1 Always Use Pixi
# CORRECT: Run through pixi
pixi run pytask
pixi run pytest
pixi run prek
# WRONG: Direct Python calls (breaks reproducibility)
python script.py # ❌ Uses wrong Python/environment
pytest # ❌ Uses system pytest, not project version
Why: Pixi ensures exact package versions via pixi.lock. Direct calls use system Python.
1.2 Source vs Output Separation
src/ # Hand-written code ONLY (version controlled)
bld/ # Generated outputs (safe to delete, NOT committed)
_build/ # Document outputs (safe to delete, NOT committed)
Rules:
- ✅ Edit files in
src/ - ❌ Never edit generated files in
bld/or_build/ - ❌ Never commit
bld/or_build/contents (they're in.gitignore) - ✅ Final results must regenerate from clean checkout via
pixi run pytask
1.3 No Hardcoded Paths
# CORRECT: Portable paths
from pathlib import Path
from meu_replication.config import BLD, SRC
output_path = BLD / "data" / "cleaned.csv"
# WRONG: Breaks on other machines
output_path = "C:/Users/nikla/Projects/..." # ❌
2. DEPENDENCY MANAGEMENT (Pixi + pixi.lock)
2.1 How to Add Dependencies
# Conda packages (prefer this)
pixi add pandas numpy
# PyPI-only packages
pixi add --pypi some-package
# Or manually edit pyproject.toml:
[tool.pixi.dependencies]
pandas = ">=2.0"
[tool.pixi.pypi-dependencies]
some-package = ">=1.0"
Then:
pixi install # Updates pixi.lock
2.2 Lockfile Policy (Critical for Reproducibility)
- Always commit
pixi.lockwhen dependencies change pixi.lockpins exact versions (e.g.,pandas==2.2.1notpandas>=2.0)- At submission time: verify all versions are pinned and consistent
- Never edit
pixi.lockmanually
3. PYTASK: Building the Computational DAG
3.1 Task Discovery Rules
pytask auto-discovers:
- Files named
task_*.py - Functions named
task_*inside them
3.2 Task Function Signature (STRICT)
from pathlib import Path
from meu_replication.config import BLD
# CORRECT: Explicit dependencies and products
def task_clean_data(
depends_on: Path = BLD / "data" / "raw.csv",
produces: Path = BLD / "data" / "cleaned.csv",
) -> None:
"""Clean raw data."""
raw = pd.read_csv(depends_on)
cleaned = clean_data(raw) # Pure function
cleaned.to_csv(produces, index=False)
# WRONG: No produces declared
def task_clean_data(): # ❌
pd.read_csv("raw.csv").to_csv("cleaned.csv") # pytask can't track this
Rules:
- Use
producesfor all outputs - Use other default arguments for dependencies
- pytask uses these signatures to build the DAG
- Never write files not declared in
produces
3.3 Multiple Products
# Option 1: Dict of products (for related outputs)
def task_fit_models(
depends_on: Path = BLD / "data" / "cleaned.csv",
produces: dict[str, Path] = {
"model_1": BLD / "models" / "model_1.pkl",
"model_2": BLD / "models" / "model_2.pkl",
},
) -> None:
...
# Option 2: Loop with @task(id=...) for distinct tasks
from meu_replication.config import COUNTRIES
for country in COUNTRIES:
@pytask.task(id=country)
def task_fetch_data(
produces: Path = BLD / "data" / f"{country}.csv",
) -> None:
...
3.4 Task Hygiene
- Task functions should be short and boring (read → transform → write)
- Real logic goes in pure helper functions under
src/ - Never mutate raw inputs
- Never write outside
produces
4. DATA CLEANING: The Three Functional Rules
4.1 The Rules (Mandatory for All Cleaning Code)
- Start with an empty DataFrame (construct cleaned columns from raw)
- Touch every variable only once (each cleaned column assigned exactly once)
- Touch with a pure function (no side effects, depends only on inputs)
4.2 Example: Good vs Bad
# GOOD: Functional cleaning
def clean_data(raw: pd.DataFrame) -> pd.DataFrame:
"""Clean raw data following the three rules."""
return pd.DataFrame({
"date": clean_dates(raw["date_raw"]),
"value": clean_values(raw["value_raw"]),
"country": clean_country_codes(raw["geo"]),
})
def clean_dates(dates: pd.Series) -> pd.Series:
"""Convert YYYY-MM format to datetime."""
return pd.to_datetime(dates, format="%Y-%m")
# BAD: Mutating transformations
def clean_data(df): # ❌
df["date"] = pd.to_datetime(df["date"]) # Mutates input
df["value"] = df["value"].fillna(0) # Mutates again
df.drop(columns=["old_col"], inplace=True) # More mutation
return df # Unclear what happened
4.3 Why These Rules Matter
- Debuggability: Search for
"column_name"finds exactly where it's defined - No hidden state: Can't have accidental dependencies between transformations
- Testability: Pure functions are trivial to test
- Reproducibility: Same inputs → same outputs, always
5. TESTING (Minimum Standard)
5.1 What to Test
Test behavior, not implementation:
- Typical inputs
- Corner cases (empty data, missing values, edge values)
- Error conditions (invalid inputs should raise informative errors)
- Every bug you've encountered (regression tests)
5.2 Test Structure
import pytest
import pandas as pd
from meu_replication.data_management.clean import clean_dates
def test_clean_dates_typical():
"""Test typical YYYY-MM input."""
raw = pd.Series(["2024-01", "2024-02"])
result = clean_dates(raw)
expected = pd.to_datetime(["2024-01-01", "2024-02-01"])
pd.testing.assert_series_equal(result, expected)
def test_clean_dates_invalid():
"""Test that invalid dates raise ValueError."""
raw = pd.Series(["not-a-date"])
with pytest.raises(ValueError, match="does not match format"):
clean_dates(raw)
@pytest.mark.parametrize("input_val,expected", [
("2024-01", "2024-01-01"),
("2024-12", "2024-12-01"),
])
def test_clean_dates_parametrized(input_val, expected):
"""Parametrized tests for multiple cases."""
result = clean_dates(pd.Series([input_val]))
assert result.iloc[0] == pd.Timestamp(expected)
5.3 Testing Checklist
- ✅ One assertion per test (when possible)
- ✅ Test fails when it should (verify counterexample)
- ✅ Use
pytest.raises(...)for expected errors - ✅ Use
@pytest.mark.parametrizeto avoid duplication - ❌ No "or-style" assertions that can pass for wrong reasons
6. PLOTLY EXPORT (Static Figures for Documents)
6.1 Setup Kaleido (Browser for Static Export)
# One-time setup in pixi environment
pixi run plotly_get_chrome
6.2 Export in pytask Tasks
import plotly.express as px
from pathlib import Path
def task_create_figure(
depends_on: Path = BLD / "data" / "cleaned.csv",
produces: Path = BLD / "figures" / "plot.png",
) -> None:
"""Create publication-ready static figure."""
df = pd.read_csv(depends_on)
fig = px.line(df, x="date", y="value", title="My Plot")
# Update layout for publication
fig.update_layout(
font=dict(size=14),
title_font_size=16,
showlegend=False, # Avoid unnecessary legends
)
# Write static export
fig.write_image(produces, width=800, height=600)
7. DOCUMENTATION REQUIREMENTS
7.1 README Must Cover
- What: Project purpose and pipeline entry point
- How: Exact commands to install and run
pixi install pixi run pytask pixi run pytest - Where: Directory layout (src, bld, documents, etc.)
- Special requirements: Runtime, memory, data restrictions, credentials
7.2 Replication Package Checklist
For final submission:
- ✅ Data availability statement
- ✅ Variable definitions and metadata
- ✅ Code for all transformations
- ✅ Software dependencies (via
pixi.lock) - ✅ Expected runtime estimate
- ✅ License information
- ✅ Document any omissions/deviations
8. DEBUGGING PLAYBOOK
When something breaks:
- State expected behavior: "What should this do?"
- Check environment: Are you in the right pixi env?
- Minimal failing case: Reduce to smallest example
- Isolate: Test individual functions separately
- One change at a time: Form hypothesis, test, repeat
- Write it down: Turn failure into a regression test
- Prefer debugger over print: Use
pdbpbreakpoints
# Add breakpoint for debugging
import pdbp
pdbp.set_trace() # Execution pauses here
9. DEFINITION OF DONE (Checklist Before Commit)
Before considering any change "done":
-
pixi run pytestpasses -
pixi run pytaskcompletes successfully - No raw/source inputs edited in place
- New outputs declared in
producesand written only there - Logic in reusable helper functions (pure where feasible)
- Bug fixes include regression tests
- README/docs updated if commands/deps/outputs changed
-
pixi.lockcommitted if dependencies changed - No uncommitted changes remain
- Code follows style (
pixi run prekpasses)
10. QUICK REFERENCE: Common Commands
# Full pipeline
pixi run pytask
# Tests
pixi run pytest # All tests
pixi run pytest tests/test_specific.py # One file
pixi run pytest -k test_function_name # One test
# Pre-commit checks
pixi run prek
# Clean build (when DAG is confused)
rm -rf bld/ _build/
pixi run pytask
# View outputs
pixi run view-paper # Paper with live reload
pixi run view-pres # Presentation with live reload
# Documentation
pixi run -e docs docs # Build docs
pixi run -e docs view-docs
11. ANTI-PATTERNS TO AVOID
❌ Mutating DataFrames in place
df["new_col"] = ... # Creates side effects
✅ Constructing new DataFrames
cleaned = pd.DataFrame({"new_col": ...})
❌ Hardcoded absolute paths
data = pd.read_csv("C:/Users/...")
✅ Relative paths from config
from meu_replication.config import BLD
data = pd.read_csv(BLD / "data" / "file.csv")
❌ Direct Python/pip calls
python script.py
pip install pandas
✅ Always through Pixi
pixi run python script.py
pixi add pandas
❌ Writing undeclared outputs
def task_clean():
df.to_csv("output.csv") # pytask doesn't know about this
✅ Declaring all products
def task_clean(produces: Path = ...):
df.to_csv(produces)
12. PROJECT-SPECIFIC NOTES
Data Fetchers (Current State)
eurostat.py: 87 variables (Categories 1-6) — Hardcoded to DEecb.py: 52 variables (Cat 4, 7, 8) — Cat 8 is EA-level (shared)bis.py: 1 variable (Cat 7 NEER) — Already parameterizedoecd.py: 8 variables (Cat 6, 7) — Already parameterized
Next Phase: Multi-Country Expansion
To parameterize:
- Eurostat: Replace
geo="DE"and"DE_"prefixes with country parameter - ECB Cat 4/7: Replace
.DE.in SDMX keys with{country}placeholder - ECB Cat 8: Fetch once (EA-level), share across all countries
19 EA members to support: DE, FR, IT, ES, NL, BE, AT, FI, GR, PT, IE, SK, SI, LT, LV, EE, LU, CY, MT
13. REVIEW RULES ADDED FROM THE 2026-04-16 AUDIT
13.1 Docstrings Must Describe Behavior First
- Start the first sentence with what the function does in domain terms.
- Mention purity, determinism, or EPP rules only after the behavior is clear.
- Avoid openings like
Pure function,Short and boring, orFollows EPP rulesas the main explanation.
13.2 Validation Must Fail Loudly or Be Explicitly Documented
- If code checks an argument, dependency, or intermediate object, either raise an informative exception immediately or return a clearly documented sentinel result.
- Do not silently continue after detecting invalid state.
- If writing an empty output is intentional, say so in the docstring and cover that branch with tests.
13.3 Prefer Named Results Over Complex Return Statements
- Avoid multi-step DataFrame construction, filtering, and sorting chains
directly inside
return. - Assign the final object to a well-named variable, then return that variable.
- Treat this as especially important in cleaning and standardization code.
13.4 Separate Heavy Computation From Lightweight Reporting
- Keep long-running estimation tasks separate from normalization, README, plot, and report-rendering tasks.
- If a task writes both audit tables and a markdown report, keep rendering thin and consider splitting it once the task becomes hard to scan.
13.5 Treat Registry Expansion As Pipeline-Critical
- The registry is not just an auxiliary file; it defines fetch, cleaning, and transformation behavior.
- Either integrate registry expansion into the DAG or keep a strong parity test plus a clearly documented regeneration command.
- Never allow templates, countries, and committed registry to drift silently.
13.6 Use Evidence-Based Reproducibility Reviews
- For a reproducibility audit, run:
pixi run pytestpixi run pytest --cov=src/meu_replication --cov-report=term-missingpixi run pytask collect
- Only claim that the full pipeline was verified if
pixi run pytaskwas actually run to completion.
FINAL REMINDER
Reproducibility = Someone else can get your exact results
This requires:
- Exact environment (
pixi.lock) - Complete DAG (
pytaskwith allproducesdeclared) - No hidden mutations (functional data cleaning)
- No manual steps (everything scripted)
- No uncommitted changes (main branch is source of truth)
When in doubt, ask: "Could a collaborator reproduce this from a clean checkout?"