name: runtime-communication description: "Use this skill when working inside the research_mvp tmux runtime with fixed agents (leader, researcher, trainer) and you need to read shared thread messages, inspect per-agent inboxes, delegate tasks between agents, or follow the repository's runtime communication contract. Specifically for the file-backed runtime CLI under research_mvp/runtime_cli.py." user-invocable: true triggers:
- runtime communication
- read the shared thread
- delegate to researcher
- send to trainer
- check agent inbox
- runtime_cli
- leader to researcher message
Runtime Communication
Use this skill only for the local research_mvp runtime.
Files To Read First
AGENTS.mdresearch_mvp/runtime.toml
Commands
Use the configured runtime CLI:
python -m research_mvp.runtime_cli --config research_mvp/runtime.toml status
python -m research_mvp.runtime_cli --config research_mvp/runtime.toml thread tail -n 50
python -m research_mvp.runtime_cli --config research_mvp/runtime.toml inbox list leader
Messaging Rules
- Shared thread is for summaries, planning context, progress reporting, and visibility.
- Direct agent work must go through targeted inbox delegation.
- If you need one specific agent to receive a message, use
delegateorthread sendthroughruntime_cli. Do not hand-editthread.jsonl, because that does not create an inbox delivery. - Formal long-running training should be submitted to an external
train_service, not left running inside thetrainertmux pane. - If the human request is to start a task under
recipe/<name>/, first readrecipe/<name>/data.md,recipe/<name>/overview.md, andrecipe/<name>/start_prompt.md. - Treat
recipe/<name>/tasks as Kaggle-style competition tasks by default unless the recipe explicitly says otherwise. - For a new
recipe/<name>/task, do EDA before baseline iteration. Put EDA scripts, notes, and charts under<workdir>/eda/. - This runtime is a machine-learning baseline team. Default iteration should follow the repository's existing baseline naming, such as
baseline_v10,baseline_v11,baseline_v12. - Each version should usually contain multiple experiments, managed by yaml configs under
baseline/experiments_v*/. - Default layout:
- experiment scripts, yaml configs, and helpers in
<workdir>/baseline/ - code in
<workdir>/src/baseline/ - data in
<workdir>/data/ - outputs, checkpoints, metrics, and logs in
<workdir>/output/ - version design and result docs in
<workdir>/docs/ - versioned experiment configs typically under
<workdir>/baseline/experiments_v11/ - versioned formal runners typically under
<workdir>/baseline/run_experiments_v11.sh - versioned outputs typically under
<workdir>/output/baseline_v11/
- experiment scripts, yaml configs, and helpers in
- For each version, keep one formal version runner script in
baseline/, typicallybaseline/run_experiments_v11.sh. That runner should callpython baseline/run_baseline.py --config baseline/experiments_v11/<config>.yamlmultiple times. runtime_rootis only for runtime state, inboxes, and thread data. Do not use it for checkpoints, artifacts, caches, or reports.- Default training behavior should be observable: unless the human explicitly asks for denser logging, training code should emit at least one meaningful progress log per epoch.
- Good coordination uses both:
- direct inbox delegation for execution
- shared thread updates for human visibility and cross-agent context
- Runtime agents should not use control-plane commands like
up,attach, orsupervise. - Avoid relying on
statusfrom inside a runtime agent. Use thread, inbox, artifacts, and direct delegation instead. - Prefer:
python -m research_mvp.runtime_cli --config research_mvp/runtime.toml delegate --from leader --to researcher "..."
instead of only writing:
leader -> all: researcher should do ...
Leader Workflow
For each new task:
- Read latest shared thread and relevant inboxes.
- Break the task into direct assignments.
- If the human starts a
recipe/<name>/task, first assign recipe reading and EDA. Do not jump straight to model iteration. - If training is needed, first ask
researcherfor a versioned package that follows the repository layout: code insrc/baseline/, configs inbaseline/experiments_v*/, a formal runner such asbaseline/run_experiments_v11.sh, outputs underoutput/baseline_v*/, and a design note indocs/using the existingbaseline_v*_exp*.mdpattern. - Expect exactly one formal version runner script per version under
baseline/; it should fan out to multiple experiments via repeatedpython baseline/run_baseline.py --config ...calls. - Review the training package before queue submission. This includes checking that
src/baseline/train.pyand related scripts provide enough intermediate logs to observe training progress, plus a clear startup log that confirms training has actually begun. - Ask
researcherto complete the minimal dry run; it should validate startup and configuration without drifting into real training. - Only after that hand the package to
trainerfor queue submission. - Delegate to each target agent with
delegate. - Write one concise
leader -> allsummary to the shared thread. - After a version is completed and worth preserving, make a git commit covering the relevant
src/baseline/andbaseline/changes. - Every 5 baseline versions, stop and compare the current team path against reference solutions or newly researched alternatives. Explicitly note the main gaps and the most promising ideas to borrow.
- Convert the most credible borrowable ideas from that review into concrete next experiments or follow-up tasks.
- Re-check inbox/thread before deciding the next action.
- Do not stop at a version summary, result note, or commit. After each version closes, immediately delegate the next version or ask one concrete blocking question to the human.
Do not:
- restart the runtime on your own
- treat tmux/socket repair as part of the normal task
- spend cycles checking infrastructure health unless the human explicitly asks for runtime repair
Worker Workflow
- Treat inbox messages as executable tasks.
- Report progress, completion, and blockers back to
leaderby default. - If the recipient is specifically
leader, preferpython -m research_mvp.runtime_cli --config research_mvp/runtime.toml delegate --from <worker> --to leader "..."so the message lands in both the thread andleaderinbox. - Use shared-thread updates for milestone visibility, artifact announcements, and high-value cross-agent context.
- If blocked, say what you need and who should act next.
Additional trainer-specific rule:
trainershould only do job submission and result triage inside the runtime.- Once a formal training job is submitted, treat the task as waiting on an external service.
- When a callback arrives from the training service, inspect the result and report back to
leader. - After submitting a training job, wait at least 1 second before querying status or queue state again.
- Do not draft or publish the final training report until the result callback has actually arrived.
trainershould not run local dry runs by default; dry runs belong toresearcher.- Once
researcherhas confirmed the minimal dry run passed, submission totrain_serviceshould normally follow immediately rather than waiting for another manual nudge. - When reporting the submission, include
train_task_id,script_path,technical_focus,script_args, andworkdir. - Default dry run should use
python baseline/run_baseline.py --config <yaml> --dry-run --fold 0. - Default formal submission should use a real script file under
<workdir>/baseline/, typicallybaseline/run_experiments_v11.sh, or another queue-ready shell wrapper created for that version. - Before saying
train_serviceis unavailable, first verify it with a concrete request such ascurl -sS http://127.0.0.1:8100/queue. - For localhost endpoints like
127.0.0.1:8100and127.0.0.1:8090, bypass shell proxy settings. Prefercurl --noproxy '*' ...orNO_PROXY=127.0.0.1,localhost curl .... - When calling
POST /jobs, use real existing paths forscript_pathandworkdir; do not reuse placeholder values like/abs/path/to/train.sh. - If an HTTP call fails, report the exact URL, command, status code, and response body to
leaderinstead of a vague statement like “127.0.0.1:8100/8090 returned 502”.
Additional researcher-specific rule:
researcherowns the minimal dry run for newly written training code and scripts.- For
recipe/<name>/tasks,researchershould begin by reading the three recipe docs and producing EDA assets undereda/before proposing iterative model work. - That dry run should validate startup, config parsing, data path resolution, and dependency wiring without drifting into real training.
src/baseline/train.pyshould emit a clear startup log before training begins so humans and agents can tell that the job truly started.- After designing each experiment version,
researchershould write the design note todocs/using the existing pattern, such asdocs/baseline_v11_1_exp.md.
Additional trainer-specific documentation rule:
- After results come back from
train_service,trainershould write a result summary todocs/using the matching result-note pattern, such asdocs/baseline_v11_1_exp_result.md. - That result summary should default to the same general style as
docs/baseline_v11_1_exp_result.md: one-line conclusion first, then Markdown tables for the key comparisons. - If multiple experiments, folds, or configs are involved, prefer a compact table, but choose columns according to the current task rather than a fixed project-specific schema.
- Even for a single run, include at least one compact Markdown table for metrics instead of only prose.
- After each completed
baseline_v*version,trainershould also generate a historical PNG trend chart underdocs/that tracks the best top 3 experiments of each baseline version across versions. - Use a filename like
docs/baseline_v08_to_v11_top3_trend.png. - When multiple baseline versions are included, connect versions with line plots so humans can visually track the trend over time.
- Mention the generated PNG path and the high-level takeaway in the corresponding result summary.