Long-Running & Life Augmentation Agents

Deep research into persistent, cloud-hosted, proactive AI agents that autonomously manage tasks, delegate to sub-agents, and augment daily life.

Vision & Definition
What Has Been Tried
Architectural Patterns
Proactive vs. Reactive Agents
Memory & Personalization for Long-Running Agents
User Interaction Patterns
What Works & What Fails
Creative Use Cases for a Life Augmentation Agent
Privacy, Security & Trust
Recommended Architecture
Key Papers & References
User Sentiment & Real-World Validation
Audio Lifelogging & Continuous Transcription

1. Vision & Definition

A life augmentation agent is a persistent, cloud-hosted AI system that runs continuously (or is awakened by triggers), maintains a model of its user's life, and proactively delegates tasks to specialized sub-agents to improve the user's daily experience. Unlike a reactive chatbot that waits for prompts, this agent:

Runs indefinitely in the background (cloud-native, always-on)
Proactively initiates actions without explicit user commands
Delegates work to specialized sub-agents (research, scheduling, monitoring, communication)
Maintains long-term memory of user preferences, history, and context
Learns and adapts from user feedback over time
Interacts minimally — surfacing only what matters, when it matters

This is the "user-centric agent" vision articulated by Zhang et al. (2026): "The future of digital services should shift from a platform-centric to a user-centric agent... prioritizing privacy, aligning with user-defined goals, and granting users control over their preferences and actions." (arXiv:2602.15682)

2. What Has Been Tried

2.1 Academic Research (2023–2026)

The idea of proactive, autonomous agents has exploded in academic research, particularly from late 2024 onward:

Foundational Work:

Generative Agents (Park et al., 2023) — Simulated 25 agents with day-long autonomy in a Smallville sandbox. Demonstrated memory retrieval, reflection, and planning over extended timeframes. Showed that believable long-running behavior is possible but requires careful memory architecture.
Voyager (Wang et al., 2023) — Lifelong learning agent in Minecraft. Demonstrated skill accumulation over 100+ hours, with a persistent skill library. Key insight: agents can self-improve if they maintain a growing capability repertoire.

Proactive Agent Research (2024–2026):

Proactive Agent (Lu et al., 2024) — First systematic approach to shifting LLMs from reactive to proactive. Created ProactiveBench (6,790 events). Found that fine-tuned models achieve 66.47% F1 in proactive assistance — meaning even the best models fail to anticipate user needs ~1/3 of the time.
Ask-before-Plan (Zhang et al., 2024, EMNLP) — Showed that proactive clarification before action dramatically improves planning quality. The Clarification-Execution-Planning (CEP) framework with 3 specialized agents outperforms monolithic approaches.
ContextAgent (Yang et al., 2025, NeurIPS) — First context-aware proactive agent using wearable sensory data. 8.5% higher accuracy in proactive predictions vs baselines. Key: multi-dimensional context extraction from video/audio on AR glasses.
ProPerSim (Kim et al., 2025, ICLR 2026) — Proactive AND personalized assistant simulation. ProPerAssistant uses retrieval-augmented preference-aligned learning that continually adapts via user ratings. Key finding: combining proactivity + personalization is more than additive.
BAO (Yao et al., 2026) — Behavioral Agentic Optimization via RL. Addresses the critical autonomy-satisfaction tradeoff: overly proactive agents annoy users. BAO's behavior regularization suppresses redundant interactions.
IntentRL (Luo et al., 2026) — Trains proactive agents to clarify latent user intents before long-running research tasks. Addresses the "autonomy-interaction dilemma": high autonomy on ambiguous queries leads to prolonged execution with unsatisfactory outcomes.
ProAgentBench (Tang et al., 2026) — 28,000+ events from 500+ hours of real user sessions. Key finding: real-world data shows bursty interaction patterns (B=0.787) completely absent in synthetic data. Models trained on synthetic data fail in production.

2.2 Industry & Open-Source Projects

AutoGPT / BabyAGI (2023): The original "run an LLM in a loop forever" experiments. AutoGPT set a high-level goal ("grow a business") and looped GPT-4 with tools. Key learnings:

Rapid context degradation: After 10-20 steps, the agent would lose track of its original goal, hallucinate sub-tasks, or loop endlessly between the same actions
No principled stopping criteria: Agents couldn't determine when they were "done" or when to yield to humans
Cost explosion: Continuous LLM invocation without intelligent routing consumed enormous API budgets
Compounding errors: Each LLM call had some error rate; over many steps, errors compounded catastrophically
As Latent.Space summarized: "AutoGPTs have continuous modes which are fully autonomous but very likely to go wrong and therefore have to be closely monitored"

Lindy.ai (2024–present): A production personal AI assistant that manages inbox, meetings, and calendar. Focuses on narrow, well-defined tasks rather than open-ended autonomy. Their approach: tightly scoped agents for specific workflows (email triage, meeting prep, calendar optimization). Reports 400K+ users. Key insight: narrow scope + high reliability > broad ambitions + frequent failures.

Replit Agent (2024–2025): Long-running coding agent for persistent development sessions. LangGraph-backed with checkpointing. Michele Catasta (VP AI): "It's easy to build the prototype of a coding agent, but deceptively hard to improve its reliability." They invested heavily in fine-grained control flow rather than pure LLM autonomy.

Rabbit R1 / Humane AI Pin (2024): Hardware-based "life augmentation" devices. Both launched to enormous hype and largely failed. Key lessons:

Latency kills: Users expect sub-second responses for frequent interactions. Cloud-hosted LLM inference is too slow for "always-on" use
Battery life: Continuous sensing + cloud communication drains batteries within hours
UI/UX mismatch: Ambient AI assistants need fundamentally different interaction patterns than chat interfaces
Overpromise, underdeliver: Marketing implied general intelligence; reality was narrow, unreliable capabilities

12-Factor Agents (HumanLayer, 2025): Practical engineering guide from Dex Horthy, who talked to 100+ SaaS builders. Key insights for long-running agents:

Factor 5: Unify execution state and business state — Agent state must be durable and inspectable
Factor 6: Launch/Pause/Resume — Agents must be pausable and resumable (critical for long-running tasks)
Factor 7: Contact humans with tool calls — Human interaction is just another tool, not a special case
Factor 10: Small, focused agents — Monolithic agents fail; specialized agents composed together succeed
Factor 11: Trigger from anywhere — Agents should respond to cron jobs, webhooks, user messages alike
Factor 12: Stateless reducer — Each agent invocation should be a pure function of its context

LangGraph Cloud (LangChain, 2024): Purpose-built infrastructure for deploying long-running agents. Provides:

Horizontally-scaling task queues with Postgres checkpointer
Background jobs for long-running tasks (polling or webhook completion)
Cron jobs for scheduled tasks
Double-texting handling (managing new user input on running threads)
Time-travel debugging (inspect, edit, resume past states)

3. Architectural Patterns

3.1 Event Sourcing for Agents (ESAA)

The ESAA architecture (dos Santos Filho, 2026, arXiv:2602.23193) separates cognitive intention from state mutation:

Agent emits structured intentions (JSON)
    ↓
Deterministic orchestrator validates
    ↓
Persists events in append-only log (activity.jsonl)
    ↓
Applies effects (file writes, API calls)
    ↓
Projects verifiable materialized view (roadmap.json)

Key properties:

Immutability: Completed task events are immutable and hash-verified
Replay: Full trajectory can be replayed from the event log
Forensic traceability: Every decision has a provenance chain
Multi-agent: Demonstrated with 4 concurrent agents, different LLMs, 50 tasks
Separation of concerns: LLM does thinking, deterministic code does execution

This maps almost directly to Temporal.io-style durable workflows and is enormously relevant for a life augmentation agent that needs reliability over days/weeks.

3.2 Orchestrator-Workers with Task Queues

From the Anthropic "Building Effective Agents" guide (2024):

Central orchestrator LLM breaks down tasks and delegates to worker LLMs
Workers are specialized for specific domains (research, scheduling, communication)
Results synthesized by orchestrator
Key: this pattern is "well-suited for complex tasks where you can't predict the subtasks needed"

For a life agent, the orchestrator maintains a persistent task queue:

User's life context → Orchestrator → Priority queue → Worker agents → Results → User
                           ↑                                              |
                           └────────── feedback loop ─────────────────────┘

3.3 Declarative Workflow Orchestration

Daunis (2025, arXiv:2512.19769) showed that most agent workflows can be expressed through a unified DSL rather than imperative code. At PayPal scale: 60% reduction in development time, 3x deployment velocity. Complex workflows expressed in <50 lines of DSL vs 500+ lines of imperative code.

For life augmentation: define recurring agent workflows declaratively (morning briefing, email triage, expense tracking) and let the runtime handle scheduling, retry, and monitoring.

3.4 Internet of Agentic AI

Yang & Zhu (2026, arXiv:2602.03145) propose a framework where autonomous, heterogeneous agents across cloud and edge dynamically form coalitions for task-driven workflows. Key concepts:

Minimum-effort coalition selection: Find the smallest group of agents to accomplish a task
Network locality: Prefer agents close to the data/user
Economic implementability: Incentive-compatible coordination
MCP as coordination layer: Uses Model Context Protocol above

3.5 Peak-Aware Long-Horizon Orchestration

APEMO (Shi & DiFranzo, 2026, arXiv:2602.17910) introduces temporal-affective scheduling:

Detects trajectory instability through behavioral proxies
Targets computational resources at peak moments (critical decisions) and endings (final user-facing output)
Under fixed compute budgets, APEMO consistently enhances trajectory-level quality
Reframes alignment as a temporal control problem — critical for agents running over hours/days

4. Proactive vs. Reactive Agents

This is perhaps the most critical dimension for a life augmentation agent. A reactive agent waits for "Hey, do X." A proactive agent notices you need X before you ask.

4.1 The Proactivity Spectrum

From the research, three levels emerge:

Level	Description	Example
Reactive	Responds only to explicit requests	"Schedule a meeting with Bob"
Suggestion-based	Notices patterns and suggests actions	"You usually meet Bob on Tuesdays — should I schedule?"
Fully proactive	Acts autonomously based on context	Notices Bob emailed about Q2 planning, cross-references your calendar, schedules meeting, drafts agenda

4.2 The Autonomy-Satisfaction Tradeoff

BAO (2026) identifies a critical Pareto frontier: more proactive ≠ better. Overly proactive agents that constantly interrupt with suggestions are worse than reactive ones. The sweet spot requires:

Timing prediction: When to intervene (ProAgentBench found bursty patterns matter)
Relevance filtering: Not every observed intent deserves action
Cost-of-interruption modeling: ProMemAssist (2025, UIST) models the user's working memory to balance assistance value vs. interruption cost

4.3 Knowledge Gap Navigation

PROPER (Kaur et al., 2026) introduces a two-agent architecture:

Dimension Generating Agent (DGA): Identifies implicit dimensions the user hasn't considered
Response Generating Agent (RGA): Balances explicit needs with discovered implicit needs

The insight: proactive agents should address needs the user doesn't know they have, not just automate what they've already expressed. This achieved 84% quality gains in single-turn evaluations.

4.4 The Advisor/Coach/Delegate Paradigm

Zhu et al. (2026, arXiv:2602.12089) ran a 243-person behavioral experiment comparing:

Advisor: Proactive recommendations
Coach: Reactive feedback on user's actions
Delegate: Autonomous execution

Key finding: users preferred the Advisor but achieved the highest gains with the Delegate. This "preference-performance misalignment" is critical: users want to feel in control but benefit from delegation. Moreover, delegation created positive externalities — even non-adopters in delegate groups received higher-quality offers.

Implication for life agents: default to delegation for high-confidence tasks, with transparent reporting. Allow users to override, but don't make them drive every decision.

5. Memory & Personalization for Long-Running Agents

5.1 O-Mem: Active User Profiling

O-Mem (Wang et al., 2025, arXiv:2511.13593) is a memory framework based on active user profiling:

Dynamically extracts and updates user characteristics from interactions
Hierarchical retrieval of persona attributes and topic-related context
Achieves SOTA on LoCoMo (51.67%) and PERSONAMEM (62.99%) benchmarks
Key: avoids semantic-grouping-before-retrieval pitfall that loses critical but semantically distant user info

5.2 AMemGym: Interactive Memory Benchmarking

AMemGym (Cheng et al., 2026, ICLR) reveals that existing memory systems (RAG, long-context LLMs, agentic memory) all have significant gaps in long-horizon conversations. Key insight: on-policy evaluation (interactive) reveals failures invisible in off-policy (static) evaluation.

5.3 Working Memory Modeling for Proactive Timing

ProMemAssist (Pu et al., 2025, UIST) models the user's cognitive working memory in real-time:

Represents perceived information as memory items and episodes
Uses encoding mechanisms (displacement, interference) from cognitive psychology
Timing predictor balances assistance value vs. interruption cost
In user studies: more selective assistance, higher engagement vs. LLM-only baseline

5.4 The Personalized Agent Survey

The comprehensive survey by Xu et al. (2026, arXiv:2602.22680) identifies four interdependent personalization components:

Profile modeling: Explicit (stated preferences) + implicit (behavioral signals)
Memory: Short-term working memory + long-term episodic/semantic memory
Planning: User-adapted goal decomposition and strategy selection
Action execution: Personalized tool selection and output formatting

Key trade-off: deeper personalization requires more user data, creating tension with privacy.

6. User Interaction Patterns

6.1 Notification Fatigue

The biggest failure mode for proactive agents. From the research:

ProAgentBench shows real user sessions have bursty patterns (B=0.787) — users interact in concentrated bursts, then go silent
During silent periods, proactive interruptions are maximally disruptive
BAO's behavior regularization specifically penalizes "redundant interactions"
Solution: preference-aligned timing — learn when the user is receptive

6.2 Trust Calibration

From "Choose Your Agent" (2026):

Users systematically under-trust delegation despite its superior outcomes
Trust builds through transparency (showing reasoning) and track record (successful completions)
"Adoption-compatible interaction rules are a prerequisite to improving human welfare"
Start with low-stakes delegation (weather reminders, meeting summaries), build to high-stakes (financial decisions, communication on behalf)

6.3 The Right Interface

From Egocentric Co-Pilot (2026, WWW):

Smart glasses + multimodal input (speech + gaze) enables "always-on assistive layer over daily life"
But introduces latency requirements: cloud offloading adds ~200ms+ vs. on-device inference
Hierarchical Context Compression supports long-horizon QA over continuous first-person video

From "The Next Paradigm" (2026):

Desktop/mobile app with proactive notifications + on-demand deep interaction
Device-cloud pipeline: lightweight on-device model for urgency triage, cloud model for complex reasoning
User should control preference dials (proactivity level, domains to manage, notification frequency)

6.4 Human-in-the-Loop as a Tool

From 12-Factor Agents: "Contact humans with tool calls" — the agent should model human interaction exactly like any other tool:

contact_human(
  channel="push_notification",
  message="I found a cheaper flight for your trip to NYC next week. Should I rebook?",
  urgency="medium",
  timeout="4h",
  fallback="keep_current_booking"
)

If the human doesn't respond within the timeout, the agent takes the fallback action. This makes human-in-the-loop a first-class, timeout-aware capability rather than a blocking dependency.

7. What Works & What Fails

7.1 What Works

Category	Examples	Why It Works
Narrow, well-defined tasks	Email triage, meeting prep, expense categorization	Clear success criteria, bounded scope, verifiable output
Information synthesis	Morning briefings, research summaries, news digests	LLMs excel at summarization; failures are low-cost
Schedule optimization	Calendar management, reminder timing, travel planning	Structured data, clear constraints, measurable outcomes
Monitoring & alerting	Price tracking, deadline monitoring, health metric trends	Event-driven triggers, simple decision logic
Routine automation	Form filling, template generation, data entry	Repetitive tasks with clear patterns

7.2 What Fails

Category	Why It Fails	Reference
Open-ended goal pursuit	Compounding errors over long horizons; context degradation	AutoGPT postmortems
High-stakes autonomy	Users don't trust AI for financial/legal/medical decisions	Choose Your Agent (2026)
Social communication	Nuance, tone, relationship context are hard to model	EchoGuard (2026)
Real-time continuous sensing	Battery, latency, cost constraints for always-on processing	Rabbit R1, Humane AI Pin failures
Unbounded task decomposition	LLMs hallucinate sub-tasks, struggle with novel problem structures	ProActiveBench real-world data
Cross-modal understanding	Combining calendar + email + location + health data coherently	ContextAgent (2025) improvements still limited

7.3 Critical Failure Patterns

The Infinite Loop: Agent creates task → fails → creates retry task → fails → ... Without proper stopping conditions, agents burn through budgets. Solution: circuit breakers, max iteration limits, exponential backoff.
Context Window Exhaustion: Long-running agents accumulate history. Eventually the context window fills and the agent "forgets" its goals. Solution: hierarchical memory with compression (Active Context Compression, 2026), event sourcing with materialized views (ESAA).
Preference Drift: User preferences change over time. An agent trained on January behavior may be wrong by March. Solution: continuous adaptation with recency weighting (O-Mem, ProPerSim).
The 80% Problem: From 12-Factor Agents — most agent frameworks get you to 80% quality, but the last 20% requires abandoning the framework and building custom. Solution: own your prompts, own your control flow, build on primitives not frameworks.
Notification Fatigue Spiral: Proactive agent sends too many suggestions → user ignores them → agent interprets silence as neutral → continues sending → user disables agent entirely. Solution: explicit feedback channels, decreasing proactivity in response to non-engagement.

8. Creative Use Cases for a Life Augmentation Agent

8.1 The "Morning Protocol" Agent

Every morning, the agent:

Scans today's calendar, identifies preparation needs
Checks email for anything requiring pre-meeting context
Reviews weather + commute for schedule impacts
Surfaces any overnight deadline changes or urgent messages
Generates a personalized briefing pushed to phone/watch

Delegate agents: CalendarAgent, EmailTriageAgent, WeatherAgent, CommutePlanner

8.2 The "Background Research" Agent

User mentions interest in a topic (via conversation, browsing, or explicit request):

Agent queues a deep research task
ResearchAgent searches academic papers, blogs, news
Runs over hours/days, building a knowledge graph
Synthesizes findings into a briefing document
Proactively surfaces when user next has free time

This is exactly the IntentRL (2026) pattern — clarify intent first, then execute autonomously.

8.3 The "Financial Health" Agent

Continuous monitoring:

Tracks spending against budget categories
Monitors bills for price increases or better alternatives
Watches subscriptions for unused services
Alerts on unusual charges
Monthly financial health summary

Requires: bank API integration, category classification, anomaly detection

8.4 The "Social Maintenance" Agent

The hardest and most nuanced use case:

Tracks last contact date with important people
Suggests reaching out when it's been too long
Notices birthdays, life events from social media
Drafts (but never sends without approval) personal messages
Manages RSVP tracking

Key constraint: never send on behalf without explicit approval. This is the "advisor" pattern from Choose Your Agent — suggest, don't act.

8.5 The "Health & Wellness" Agent

Monitors sleep patterns (via wearable data)
Tracks medication schedules
Suggests exercise based on calendar gaps
Monitors mood patterns from interaction data (very carefully, with explicit consent)
Connects health data to productivity patterns

ProMemAssist (2025) provides the cognitive model: intervene when the user's working memory is overloaded, not when they're in flow state.

8.6 The "Travel Optimization" Agent

For frequent travelers:

Monitors prices for upcoming trips
Manages loyalty programs and point optimization
Handles rebooking during disruptions
Creates location-aware travel guides
Manages expense reports post-trip

8.7 The "Knowledge Worker Copilot" Agent

Always-on background support:

Monitors project deadlines across tools (Jira, GitHub, email)
Identifies tasks that are blocked or at risk
Pre-fetches context before meetings (related documents, recent changes)
Suggests task prioritization based on deadlines, impact, and dependencies
Creates end-of-week summaries

8.8 The "Home Automation Intelligence Layer"

Layer on top of smart home:

Learns household patterns (wake times, preferences, routines)
Proactively adjusts based on calendar (early meeting → earlier alarm)
Coordinates grocery lists with meal planning
Manages maintenance reminders (HVAC filters, car service)
Integrates weather forecasts with heating/cooling optimization

9. Privacy, Security & Trust

9.1 The Privacy Paradox

AgentScope (2026, arXiv:2603.04902) evaluates contextual privacy across agentic workflows accessing calendars, email, and personal files. Life augmentation agents need deep access to personal data, but:

Users want personalization (requires data) but fear surveillance
Cross-domain data aggregation (calendar + health + finance) creates sensitive profiles
Current LLM providers process data on external servers

9.2 User-Centric Architecture

From "The Next Paradigm" (2026):

On-device processing for sensitive data (triage, classification)
Cloud processing only for complex reasoning, with minimal data transmission
User-controlled data policies: what data the agent can access, retain, and share
Audit logs: every data access and decision is traceable (cf. ESAA event sourcing)

9.3 Trust Hierarchy

For a life agent, different actions require different trust levels:

Trust Level	Actions	Control
Full autonomy	Reading calendar, checking weather, monitoring prices	No approval needed
Notify & act	Sending standard replies, rescheduling non-critical meetings	Notification + undo window
Ask & act	Sending personal messages, making purchases, changing plans	Explicit approval required
Never autonomous	Financial transactions above threshold, medical decisions, legal actions	Human must initiate

10. Recommended Architecture

Based on all the research, here's a synthesized architecture for a life augmentation agent:

10.1 Core Components

┌─────────────────────────────────────────────────────┐
│                  Life Agent Core                     │
│                                                      │
│  ┌───────────┐   ┌────────────┐   ┌──────────────┐ │
│  │ Event Bus │──→│Orchestrator│──→│ Task Queue    │ │
│  │(triggers) │   │  (planner) │   │(priority heap)│ │
│  └───────────┘   └────────────┘   └──────────────┘ │
│        ↑               │                  │          │
│        │               ↓                  ↓          │
│  ┌───────────┐   ┌────────────┐   ┌──────────────┐ │
│  │ User      │   │ Memory     │   │ Worker Pool  │ │
│  │ Interface │←──│ (O-Mem +   │   │ (sub-agents) │ │
│  │ (multi-ch)│   │  profiles) │   │              │ │
│  └───────────┘   └────────────┘   └──────────────┘ │
│                        │                  │          │
│                        ↓                  ↓          │
│               ┌────────────────────────────┐        │
│               │  Event Store (append-only) │        │
│               │  + State Snapshots         │        │
│               └────────────────────────────┘        │
└─────────────────────────────────────────────────────┘

10.2 Key Design Decisions

Event-sourced state (ESAA pattern): All agent actions recorded in append-only log. State reconstructable from events. Enables debugging, audit, and replay.
Stateless orchestrator (12-Factor #12): Each orchestrator invocation is a pure function of current state + new event. No hidden state. Enables horizontal scaling and crash recovery.
Priority-based task queue: Tasks have urgency, importance, and deadline. Orchestrator continuously re-prioritizes. Inspired by APEMO peak-aware scheduling.
Multi-channel input/output: Triggers from cron (morning briefing), webhooks (email arrival), user messages (chat), sensors (location change). Output via push notifications, email, chat, or ambient display.
Specialized worker agents: CalendarAgent, ResearchAgent, EmailAgent, FinanceAgent, HealthAgent. Each has narrow scope, specific tools, and measurable outcomes.
Adaptive proactivity: Use ProPerSim-style user modeling to learn intervention timing. Start conservative (suggestions only), build trust, increase autonomy over time.
O-Mem-style memory: Active user profiling with hierarchical retrieval. Short-term working memory (today's context), medium-term episodic memory (recent interactions), long-term semantic memory (stable preferences).
Human-as-a-tool: Human interaction modeled as a tool call with timeout and fallback (12-Factor #7). Never block on human response for time-sensitive tasks.

10.3 Technology Stack Recommendations

Orchestration: Durable workflow engine (Temporal.io, Azure Durable Functions, or custom event-sourced loop)
Agent Framework: Microsoft Agent Framework (for .NET) or LangGraph (for Python) — both support checkpointing and persistence
Memory Store: Vector DB (Qdrant/Pinecone) for semantic retrieval + PostgreSQL for structured user profiles
Event Store: Append-only log (PostgreSQL with immutable rows, or EventStoreDB)
Task Queue: Redis-backed priority queue or cloud-native queue (Azure Service Bus, SQS)
Notifications: Multi-channel push (FCM, APNs, email, Slack/Teams webhook)
LLM: Tiered — fast/cheap model for triage/classification, powerful model for reasoning/planning
Monitoring: OpenTelemetry for distributed tracing across agent invocations

10.4 Critical Engineering Principles

Circuit breakers: Max iterations per task, budget limits per day, error thresholds
Graceful degradation: If a worker agent fails, others continue operating
Idempotent operations: Every action can be safely retried
Transparent reasoning: Every decision should have a human-readable explanation
Progressive trust: Start with low-autonomy defaults, increase based on track record
Cost accounting: Track per-task LLM costs, optimize routing to smaller models where possible
Offline-first: Agent should handle network interruptions gracefully with local queuing

11. Key Papers & References

11.1 Papers Downloaded & Converted (Batch 6)

Paper	arXiv	Key Contribution
Proactive Agent	2410.12361	First systematic reactive→proactive shift; ProactiveBench
Ask-before-Plan	2406.12639	Proactive clarification + CEP framework (EMNLP 2024)
ContextAgent	2505.14668	Wearable sensory context for proactive agents (NeurIPS 2025)
ProAgent Sensory	2512.06721	End-to-end proactive system on AR glasses
ProPerSim	2509.21730	Proactive + personalized assistants (ICLR 2026)
BAO	2602.11351	Behavioral agentic optimization; autonomy-satisfaction Pareto
IntentRL	2602.03468	RL for proactive intent clarification in deep research
ProAgentBench	2602.04482	Real-world proactive assistance benchmark (28K+ events)
PROPER	2601.09926	Knowledge gap navigation for proactivity
O-Mem	2511.13593	Omni memory for personalized long-horizon agents
Personalized LLM Agents Survey	2602.22680	Comprehensive survey: profile, memory, planning, action
AMemGym	2603.01966	Interactive memory benchmarking for long-horizon (ICLR 2026)
ProMemAssist	2507.21378	Working memory modeling for proactive timing (UIST'25)
Egocentric Co-Pilot	2603.01104	Always-on smart glasses agent (WWW 2026)
Next Paradigm: User-Centric	2602.15682	User-centric agent vs platform-centric service
Choose Your Agent	2602.12089	Advisor vs Coach vs Delegate tradeoffs (N=243)
ESAA	2602.23193	Event sourcing architecture for autonomous agents
Alignment in Time	2602.17910	Peak-aware orchestration for long-horizon systems
Internet of Agentic AI	2602.03145	Distributed agent coalition formation
Declarative Agent Workflows	2512.19769	DSL for agent workflow orchestration (PayPal scale)

11.2 Blog Posts & Practical Resources

Resource	Key Takeaway
Anthropic: Building Effective Agents	Start simple, use orchestrator-workers for complex tasks, agents for open-ended problems
12-Factor Agents	Stateless reducer pattern, pause/resume, human-as-tool, small focused agents
LangGraph Cloud	Task queues, cron, background jobs, time-travel debugging for long-running agents
Latent.Space: Anatomy of Autonomy	AutoGPT/BabyAGI dissection, 5 capability levels, self-driving car analogy
Lindy.ai	Production personal assistant — narrow scope + high reliability wins

11.3 Papers Already in Collection (Relevant)

generative-agents-stanford-2023 — Long-running agent simulation patterns
voyager-lifelong-learning-2023 — Lifelong skill accumulation
memgpt-llm-operating-system-2023 — OS-like memory management for agents
caveagent-stateful-runtime-2026 — Stateful agent runtime with persistent containers
ariadnemem-lifelong-memory-2026 — Lifelong episodic-semantic memory
anatomy-agentic-memory-2026 — Comprehensive memory architecture taxonomy
active-context-compression-2026 — Autonomous memory management
agentic-ai-frameworks-protocols-2025 — Architecture & protocol landscape

12. User Sentiment & Real-World Validation

Added March 2026. Synthesized from 11 papers with real user studies plus web sources (Reddit, product reviews). Full analysis: blueprints/life-agent/docs/user-sentiment-research.md.

12.1 The Preference-Performance Paradox

The single most important finding across all research: users systematically prefer the interaction mode that gives them worse outcomes.

Choose Your Agent (2026, N=243, 6,561 trading decisions):

44% preferred Advisor ("show me options, I'll decide")
19.3% preferred Delegate ("just do it for me")
21.4% preferred no AI at all
Yet Delegate produced statistically better economic outcomes (β=0.084, p=.034)
Users who preferred AI reported 20% higher mental effort than autonomy-seekers (p<.01) — they willingly accepted cognitive load for perceived control

This means life agent designers face a fundamental tension: the optimal interaction pattern (delegation) is the one users resist most. The solution is progressive trust — start with advisory, earn the right to delegate through demonstrated reliability.

12.2 The "Less Is More" Principle (Empirically Proven)

ProMemAssist (UIST 2025, N=12): delivered 60% fewer messages while achieving 2.6× higher positive engagement and lower frustration (NASA-TLX: 2.32 vs 3.14, p<.05). The mechanism: modeling the user's working memory load in real-time and deferring messages when cognitive capacity is low.

ProPerSim (ICLR 2026, 32 personas): the system started at 24 recommendations/hour and naturally converged to ~6/hour through preference learning. This emergent "learning to be quiet" behavior improved satisfaction from 2.2/4 to 3.3/4 over 14 days.

BAO (CMU/Salesforce/MIT): without behavior regularization, agent User Involvement Rate shoots from 0.21 to 0.91 — agents pester users 91% of the time. With regularization (information-seeking + over-thinking penalties), UR drops to 0.2148.

Design principle: Max 6 proactive notifications per hour. Decrease on non-engagement. The cost of a false positive (unnecessary interruption) exceeds the cost of a false negative (missed opportunity to assist).

12.3 Trust Fragility

Trust is asymmetric — one failure outweighs many successes:

ProMemAssist P5 indicated a single unhelpful suggestion could break trust with the entire system
Trust was influenced more by quality of assistance than timing of delivery
Algorithm aversion — tendency to abandon AI after observing even small errors — significantly reduces adoption
"Poorly calibrated proactivity can undermine user trust, agency, and interaction quality" (PROPER, 2026)

Design principle: Optimize all proactive features for precision over recall. Don't ship a feature that fails >20% of the time.

12.4 Real Product Outcomes

Product	Result	Lesson
Lindy.ai	400K+ users	Narrow scope + high reliability beats broad ambitions
Rabbit R1	Failed basic tasks (wrong location for weather, CAPTCHA loops for restaurant booking)	Hardware + breadth of capability doesn't compensate for unreliable execution
AutoGPT/BabyAGI	Rapid context degradation after 10-20 steps	Unbounded autonomy without grounding fails catastrophically
Replit Agent	"Easy to build the prototype, deceptively hard to improve reliability"	The demo-to-production gap is enormous
Smart glasses (various)	RayNeo X2 Pro: ~20 min battery in continuous mode	Always-on hardware is bottlenecked by power, not by AI capability

12.5 What Real Users Want (Direct Evidence)

Reddit (r/artificial, 2025) — a neurodivergent parent posted:

"I struggle with staying on top of daily tasks. I'd love to find an AI assistant that can: Talk to me, not just respond when I ask. Give me reminders and nudges, even when I'm distracted. Help manage tasks, routines, and my health needs (I'm autistic/ADHD). Adapt to my life as a parent."

No commenter had a real solution. Other Reddit threads (r/singularity) reported teams being cut from 50→30 people using AI agents, editorial staff of 17 laid off, software engineers shifting to "supervising AI" — but all in workplace automation, not personal life augmentation.

The gap: workplace AI agent adoption is accelerating; personal life agent adoption has no viable product yet.

12.6 Surprising Empirical Findings

Delegation benefits non-users: non-users in Delegate groups had 21.6% higher surplus than baseline groups — AI improved outcomes for people who didn't even use it (market spillover)
Introverts benefit most from personalized AI (larger improvement than extraverts in ProPerSim)
Chain-of-thought hurts proactive timing: CoT prompting degrades performance on proactive prediction — recall dropped from 94.4% to 17.1% in one model (ProAgentBench)
Humans are bursty, LLMs aren't: real interaction burstiness B=0.787 vs LLM-simulated B=0.166 (ProAgentBench) — any LLM-generated training data misses this entirely
Knowing when > knowing what: timing prediction 64.4% accurate, content prediction maxes at 30.5% semantic similarity — solve timing first, content second
Explicit feedback required: ProPerSim showed implicit signals alone offer "limited benefit" — the agent must ask for ratings, not just observe behavior

13. Audio Lifelogging & Continuous Transcription

Full research document: knowledge-base/audio-lifelogging-research.md (14 sections, 17 academic papers, extensive product/community analysis)

Audio lifelogging — always-on recording with live transcription, speaker attribution, and knowledge extraction — is the richest possible input for a life augmentation agent. It captures commitments, decisions, context, and social interactions at the moment they happen, with zero user effort.

13.1 State of the Art (2024–2025)

The field has matured from academic concept to consumer product:

Product/Tool	Form Factor	Status	Key Capability
say (u/8ta4)	macOS app + Electron	Open source, 2+ years daily use	Deepgram streaming → LLM structuring → queryable archive
Omi	BLE pendant (nRF/ESP32, ~$24)	Open source (MIT), 7.8k GitHub stars	Hardware + Flutter app + cloud pipeline
Limitless	Pendant	Acquired by Meta (2025)	Meeting transcription + speaker ID
Plaud.ai	Credit-card form factor	Shipping product	Business meeting transcription
Bee (Shopify)	Mobile app	Shipping product	Personal AI diary from conversations

Key technical components: Deepgram nova-3 streaming ASR (~$0.0043/min), ECAPA-TDNN speaker embeddings (0.8% EER), Pyannote diarization (open source), Silero VAD (on-device filtering eliminates ~60% silence).

13.2 Critical Constraints

iOS is not viable for always-on recording: No persistent background audio entitlement, orange indicator dot, unpredictable app killing. Apple Watch battery cannot sustain 24/7 mic. A dedicated BLE pendant is the only reliable architecture.
Noisy environments: Transcription accuracy degrades significantly in crowds, showers, wind. No viable waterproof mic solution exists (IP67 housings kill mic sensitivity).
Speaker attribution in the wild: ECAPA-TDNN's 0.8% EER is measured in controlled conditions. Ambient audio with overlapping speakers, distance, and background noise reduces accuracy. Enrollment UX (10-second samples per contact) adds onboarding friction.
Privacy/legal: Recording laws vary by jurisdiction (one-party vs two-party consent). GDPR right to erasure conflicts with append-only architectures. Illinois BIPA applies to voiceprint biometrics. Privacy-by-design (transcribe → discard audio) is the minimum viable approach.
Cost at scale: ~$1/day for Deepgram with heavy use. VAD pre-filtering is essential. Deepgram offers $200 free credit (~4 months runway).

13.3 Implications for Life Agents

Audio lifelogging transforms a life agent from reactive (user must type/speak commands) to truly ambient (agent passively absorbs context from all conversations):

Episodic memory becomes conversational memory: The agent doesn't just remember tasks — it remembers what was said, by whom, in what context. "What did Alex say about the deadline?" becomes answerable.
Commitment capture: Spoken "I need to..." or "Let's plan to..." automatically become tracked tasks (see scenario S60).
Social context: The agent builds a knowledge graph of what different people care about, what they told you, and when. This enables social scenarios (gift suggestions, relationship maintenance) that were previously impossible without manual data entry.
Cognitive offloading: The most consistent finding from daily users is that recording reduces cognitive load — people speak more freely when they know everything is captured and searchable.

13.4 Open Research Problems

Waterproof/harsh-environment recording — no solution exists
Real-time commitment/intent detection from ambient speech (high false-positive rate)
Long-term conversational memory retrieval at scale (15M+ words/year)
Multi-language speaker diarization (enrollment per language needed?)
Emotional tone tracking over time (Pierce & Mann, 2021) — feasible but not yet integrated into any lifelogging product
Consent UX — how to gracefully handle two-party consent jurisdictions in always-on recording

13.5 Key Academic References

Vemuri et al. (2006) — iRemember: proactive information retrieval from personal audio
Harvey et al. (2016) — Audio lifelogging for event summarization
Hodges et al. (2006) — SenseCam visual lifelogs for memory augmentation
Lee & Dey (2008) — Lifelogging memory appliance for automatic recording
Pierce & Mann (2021) — Affective lifelogging with multimodal sensing
Desplanques et al. (2020) — ECAPA-TDNN speaker verification

See knowledge-base/audio-lifelogging-research.md §4 for full paper list (17 papers).

Document created: March 2026 Papers downloaded: 20 new papers (Batch 6) Total papers in collection: 126 User sentiment research added: March 2026 (11 papers + web sources) Audio lifelogging research added: March 2026 (17 papers + product/community analysis)

ナビゲーション

Skillsとは？

リンク

Long-Running & Life Augmentation Agents

Long-Running & Life Augmentation Agents

Table of Contents

1. Vision & Definition

2. What Has Been Tried

2.1 Academic Research (2023–2026)

2.2 Industry & Open-Source Projects

3. Architectural Patterns

3.1 Event Sourcing for Agents (ESAA)

3.2 Orchestrator-Workers with Task Queues

3.3 Declarative Workflow Orchestration

3.4 Internet of Agentic AI

3.5 Peak-Aware Long-Horizon Orchestration

4. Proactive vs. Reactive Agents

4.1 The Proactivity Spectrum

4.2 The Autonomy-Satisfaction Tradeoff

4.3 Knowledge Gap Navigation

4.4 The Advisor/Coach/Delegate Paradigm

5. Memory & Personalization for Long-Running Agents

5.1 O-Mem: Active User Profiling

5.2 AMemGym: Interactive Memory Benchmarking

5.3 Working Memory Modeling for Proactive Timing

5.4 The Personalized Agent Survey

6. User Interaction Patterns

6.1 Notification Fatigue

6.2 Trust Calibration

6.3 The Right Interface

6.4 Human-in-the-Loop as a Tool

7. What Works & What Fails

7.1 What Works

7.2 What Fails

7.3 Critical Failure Patterns

8. Creative Use Cases for a Life Augmentation Agent

8.1 The "Morning Protocol" Agent

8.2 The "Background Research" Agent

8.3 The "Financial Health" Agent

8.4 The "Social Maintenance" Agent

8.5 The "Health & Wellness" Agent

8.6 The "Travel Optimization" Agent

8.7 The "Knowledge Worker Copilot" Agent

8.8 The "Home Automation Intelligence Layer"

9. Privacy, Security & Trust

9.1 The Privacy Paradox

9.2 User-Centric Architecture

9.3 Trust Hierarchy

10. Recommended Architecture

10.1 Core Components

10.2 Key Design Decisions

10.3 Technology Stack Recommendations

10.4 Critical Engineering Principles

11. Key Papers & References

11.1 Papers Downloaded & Converted (Batch 6)

11.2 Blog Posts & Practical Resources

11.3 Papers Already in Collection (Relevant)

12. User Sentiment & Real-World Validation

12.1 The Preference-Performance Paradox

12.2 The "Less Is More" Principle (Empirically Proven)

12.3 Trust Fragility

12.4 Real Product Outcomes

12.5 What Real Users Want (Direct Evidence)

12.6 Surprising Empirical Findings

13. Audio Lifelogging & Continuous Transcription

13.1 State of the Art (2024–2025)

13.2 Critical Constraints

13.3 Implications for Life Agents

13.4 Open Research Problems

13.5 Key Academic References

関連スキル(⚙️ DevOps)