Long-Running & Life Augmentation Agents
Deep research into persistent, cloud-hosted, proactive AI agents that autonomously manage tasks, delegate to sub-agents, and augment daily life.
Table of Contents
- Vision & Definition
- What Has Been Tried
- Architectural Patterns
- Proactive vs. Reactive Agents
- Memory & Personalization for Long-Running Agents
- User Interaction Patterns
- What Works & What Fails
- Creative Use Cases for a Life Augmentation Agent
- Privacy, Security & Trust
- Recommended Architecture
- Key Papers & References
- User Sentiment & Real-World Validation
- Audio Lifelogging & Continuous Transcription
1. Vision & Definition
A life augmentation agent is a persistent, cloud-hosted AI system that runs continuously (or is awakened by triggers), maintains a model of its user's life, and proactively delegates tasks to specialized sub-agents to improve the user's daily experience. Unlike a reactive chatbot that waits for prompts, this agent:
- Runs indefinitely in the background (cloud-native, always-on)
- Proactively initiates actions without explicit user commands
- Delegates work to specialized sub-agents (research, scheduling, monitoring, communication)
- Maintains long-term memory of user preferences, history, and context
- Learns and adapts from user feedback over time
- Interacts minimally — surfacing only what matters, when it matters
This is the "user-centric agent" vision articulated by Zhang et al. (2026): "The future of digital services should shift from a platform-centric to a user-centric agent... prioritizing privacy, aligning with user-defined goals, and granting users control over their preferences and actions." (arXiv:2602.15682)
2. What Has Been Tried
2.1 Academic Research (2023–2026)
The idea of proactive, autonomous agents has exploded in academic research, particularly from late 2024 onward:
Foundational Work:
- Generative Agents (Park et al., 2023) — Simulated 25 agents with day-long autonomy in a Smallville sandbox. Demonstrated memory retrieval, reflection, and planning over extended timeframes. Showed that believable long-running behavior is possible but requires careful memory architecture.
- Voyager (Wang et al., 2023) — Lifelong learning agent in Minecraft. Demonstrated skill accumulation over 100+ hours, with a persistent skill library. Key insight: agents can self-improve if they maintain a growing capability repertoire.
Proactive Agent Research (2024–2026):
- Proactive Agent (Lu et al., 2024) — First systematic approach to shifting LLMs from reactive to proactive. Created ProactiveBench (6,790 events). Found that fine-tuned models achieve 66.47% F1 in proactive assistance — meaning even the best models fail to anticipate user needs ~1/3 of the time.
- Ask-before-Plan (Zhang et al., 2024, EMNLP) — Showed that proactive clarification before action dramatically improves planning quality. The Clarification-Execution-Planning (CEP) framework with 3 specialized agents outperforms monolithic approaches.
- ContextAgent (Yang et al., 2025, NeurIPS) — First context-aware proactive agent using wearable sensory data. 8.5% higher accuracy in proactive predictions vs baselines. Key: multi-dimensional context extraction from video/audio on AR glasses.
- ProPerSim (Kim et al., 2025, ICLR 2026) — Proactive AND personalized assistant simulation. ProPerAssistant uses retrieval-augmented preference-aligned learning that continually adapts via user ratings. Key finding: combining proactivity + personalization is more than additive.
- BAO (Yao et al., 2026) — Behavioral Agentic Optimization via RL. Addresses the critical autonomy-satisfaction tradeoff: overly proactive agents annoy users. BAO's behavior regularization suppresses redundant interactions.
- IntentRL (Luo et al., 2026) — Trains proactive agents to clarify latent user intents before long-running research tasks. Addresses the "autonomy-interaction dilemma": high autonomy on ambiguous queries leads to prolonged execution with unsatisfactory outcomes.
- ProAgentBench (Tang et al., 2026) — 28,000+ events from 500+ hours of real user sessions. Key finding: real-world data shows bursty interaction patterns (B=0.787) completely absent in synthetic data. Models trained on synthetic data fail in production.
2.2 Industry & Open-Source Projects
AutoGPT / BabyAGI (2023): The original "run an LLM in a loop forever" experiments. AutoGPT set a high-level goal ("grow a business") and looped GPT-4 with tools. Key learnings:
- Rapid context degradation: After 10-20 steps, the agent would lose track of its original goal, hallucinate sub-tasks, or loop endlessly between the same actions
- No principled stopping criteria: Agents couldn't determine when they were "done" or when to yield to humans
- Cost explosion: Continuous LLM invocation without intelligent routing consumed enormous API budgets
- Compounding errors: Each LLM call had some error rate; over many steps, errors compounded catastrophically
- As Latent.Space summarized: "AutoGPTs have continuous modes which are fully autonomous but very likely to go wrong and therefore have to be closely monitored"
Lindy.ai (2024–present): A production personal AI assistant that manages inbox, meetings, and calendar. Focuses on narrow, well-defined tasks rather than open-ended autonomy. Their approach: tightly scoped agents for specific workflows (email triage, meeting prep, calendar optimization). Reports 400K+ users. Key insight: narrow scope + high reliability > broad ambitions + frequent failures.
Replit Agent (2024–2025): Long-running coding agent for persistent development sessions. LangGraph-backed with checkpointing. Michele Catasta (VP AI): "It's easy to build the prototype of a coding agent, but deceptively hard to improve its reliability." They invested heavily in fine-grained control flow rather than pure LLM autonomy.
Rabbit R1 / Humane AI Pin (2024): Hardware-based "life augmentation" devices. Both launched to enormous hype and largely failed. Key lessons:
- Latency kills: Users expect sub-second responses for frequent interactions. Cloud-hosted LLM inference is too slow for "always-on" use
- Battery life: Continuous sensing + cloud communication drains batteries within hours
- UI/UX mismatch: Ambient AI assistants need fundamentally different interaction patterns than chat interfaces
- Overpromise, underdeliver: Marketing implied general intelligence; reality was narrow, unreliable capabilities
12-Factor Agents (HumanLayer, 2025): Practical engineering guide from Dex Horthy, who talked to 100+ SaaS builders. Key insights for long-running agents:
- Factor 5: Unify execution state and business state — Agent state must be durable and inspectable
- Factor 6: Launch/Pause/Resume — Agents must be pausable and resumable (critical for long-running tasks)
- Factor 7: Contact humans with tool calls — Human interaction is just another tool, not a special case
- Factor 10: Small, focused agents — Monolithic agents fail; specialized agents composed together succeed
- Factor 11: Trigger from anywhere — Agents should respond to cron jobs, webhooks, user messages alike
- Factor 12: Stateless reducer — Each agent invocation should be a pure function of its context
LangGraph Cloud (LangChain, 2024): Purpose-built infrastructure for deploying long-running agents. Provides:
- Horizontally-scaling task queues with Postgres checkpointer
- Background jobs for long-running tasks (polling or webhook completion)
- Cron jobs for scheduled tasks
- Double-texting handling (managing new user input on running threads)
- Time-travel debugging (inspect, edit, resume past states)
3. Architectural Patterns
3.1 Event Sourcing for Agents (ESAA)
The ESAA architecture (dos Santos Filho, 2026, arXiv:2602.23193) separates cognitive intention from state mutation:
Agent emits structured intentions (JSON)
↓
Deterministic orchestrator validates
↓
Persists events in append-only log (activity.jsonl)
↓
Applies effects (file writes, API calls)
↓
Projects verifiable materialized view (roadmap.json)
Key properties:
- Immutability: Completed task events are immutable and hash-verified
- Replay: Full trajectory can be replayed from the event log
- Forensic traceability: Every decision has a provenance chain
- Multi-agent: Demonstrated with 4 concurrent agents, different LLMs, 50 tasks
- Separation of concerns: LLM does thinking, deterministic code does execution
This maps almost directly to Temporal.io-style durable workflows and is enormously relevant for a life augmentation agent that needs reliability over days/weeks.
3.2 Orchestrator-Workers with Task Queues
From the Anthropic "Building Effective Agents" guide (2024):
- Central orchestrator LLM breaks down tasks and delegates to worker LLMs
- Workers are specialized for specific domains (research, scheduling, communication)
- Results synthesized by orchestrator
- Key: this pattern is "well-suited for complex tasks where you can't predict the subtasks needed"
For a life agent, the orchestrator maintains a persistent task queue:
User's life context → Orchestrator → Priority queue → Worker agents → Results → User
↑ |
└────────── feedback loop ─────────────────────┘
3.3 Declarative Workflow Orchestration
Daunis (2025, arXiv:2512.19769) showed that most agent workflows can be expressed through a unified DSL rather than imperative code. At PayPal scale: 60% reduction in development time, 3x deployment velocity. Complex workflows expressed in <50 lines of DSL vs 500+ lines of imperative code.
For life augmentation: define recurring agent workflows declaratively (morning briefing, email triage, expense tracking) and let the runtime handle scheduling, retry, and monitoring.
3.4 Internet of Agentic AI
Yang & Zhu (2026, arXiv:2602.03145) propose a framework where autonomous, heterogeneous agents across cloud and edge dynamically form coalitions for task-driven workflows. Key concepts:
- Minimum-effort coalition selection: Find the smallest group of agents to accomplish a task
- Network locality: Prefer agents close to the data/user
- Economic implementability: Incentive-compatible coordination
- MCP as coordination layer: Uses Model Context Protocol above
3.5 Peak-Aware Long-Horizon Orchestration
APEMO (Shi & DiFranzo, 2026, arXiv:2602.17910) introduces temporal-affective scheduling:
- Detects trajectory instability through behavioral proxies
- Targets computational resources at peak moments (critical decisions) and endings (final user-facing output)
- Under fixed compute budgets, APEMO consistently enhances trajectory-level quality
- Reframes alignment as a temporal control problem — critical for agents running over hours/days
4. Proactive vs. Reactive Agents
This is perhaps the most critical dimension for a life augmentation agent. A reactive agent waits for "Hey, do X." A proactive agent notices you need X before you ask.
4.1 The Proactivity Spectrum
From the research, three levels emerge:
| Level | Description | Example |
|---|---|---|
| Reactive | Responds only to explicit requests | "Schedule a meeting with Bob" |
| Suggestion-based | Notices patterns and suggests actions | "You usually meet Bob on Tuesdays — should I schedule?" |
| Fully proactive | Acts autonomously based on context | Notices Bob emailed about Q2 planning, cross-references your calendar, schedules meeting, drafts agenda |
4.2 The Autonomy-Satisfaction Tradeoff
BAO (2026) identifies a critical Pareto frontier: more proactive ≠ better. Overly proactive agents that constantly interrupt with suggestions are worse than reactive ones. The sweet spot requires:
- Timing prediction: When to intervene (ProAgentBench found bursty patterns matter)
- Relevance filtering: Not every observed intent deserves action
- Cost-of-interruption modeling: ProMemAssist (2025, UIST) models the user's working memory to balance assistance value vs. interruption cost
4.3 Knowledge Gap Navigation
PROPER (Kaur et al., 2026) introduces a two-agent architecture:
- Dimension Generating Agent (DGA): Identifies implicit dimensions the user hasn't considered
- Response Generating Agent (RGA): Balances explicit needs with discovered implicit needs
The insight: proactive agents should address needs the user doesn't know they have, not just automate what they've already expressed. This achieved 84% quality gains in single-turn evaluations.
4.4 The Advisor/Coach/Delegate Paradigm
Zhu et al. (2026, arXiv:2602.12089) ran a 243-person behavioral experiment comparing:
- Advisor: Proactive recommendations
- Coach: Reactive feedback on user's actions
- Delegate: Autonomous execution
Key finding: users preferred the Advisor but achieved the highest gains with the Delegate. This "preference-performance misalignment" is critical: users want to feel in control but benefit from delegation. Moreover, delegation created positive externalities — even non-adopters in delegate groups received higher-quality offers.
Implication for life agents: default to delegation for high-confidence tasks, with transparent reporting. Allow users to override, but don't make them drive every decision.
5. Memory & Personalization for Long-Running Agents
5.1 O-Mem: Active User Profiling
O-Mem (Wang et al., 2025, arXiv:2511.13593) is a memory framework based on active user profiling:
- Dynamically extracts and updates user characteristics from interactions
- Hierarchical retrieval of persona attributes and topic-related context
- Achieves SOTA on LoCoMo (51.67%) and PERSONAMEM (62.99%) benchmarks
- Key: avoids semantic-grouping-before-retrieval pitfall that loses critical but semantically distant user info
5.2 AMemGym: Interactive Memory Benchmarking
AMemGym (Cheng et al., 2026, ICLR) reveals that existing memory systems (RAG, long-context LLMs, agentic memory) all have significant gaps in long-horizon conversations. Key insight: on-policy evaluation (interactive) reveals failures invisible in off-policy (static) evaluation.
5.3 Working Memory Modeling for Proactive Timing
ProMemAssist (Pu et al., 2025, UIST) models the user's cognitive working memory in real-time:
- Represents perceived information as memory items and episodes
- Uses encoding mechanisms (displacement, interference) from cognitive psychology
- Timing predictor balances assistance value vs. interruption cost
- In user studies: more selective assistance, higher engagement vs. LLM-only baseline
5.4 The Personalized Agent Survey
The comprehensive survey by Xu et al. (2026, arXiv:2602.22680) identifies four interdependent personalization components:
- Profile modeling: Explicit (stated preferences) + implicit (behavioral signals)
- Memory: Short-term working memory + long-term episodic/semantic memory
- Planning: User-adapted goal decomposition and strategy selection
- Action execution: Personalized tool selection and output formatting
Key trade-off: deeper personalization requires more user data, creating tension with privacy.
6. User Interaction Patterns
6.1 Notification Fatigue
The biggest failure mode for proactive agents. From the research:
- ProAgentBench shows real user sessions have bursty patterns (B=0.787) — users interact in concentrated bursts, then go silent
- During silent periods, proactive interruptions are maximally disruptive
- BAO's behavior regularization specifically penalizes "redundant interactions"
- Solution: preference-aligned timing — learn when the user is receptive
6.2 Trust Calibration
From "Choose Your Agent" (2026):
- Users systematically under-trust delegation despite its superior outcomes
- Trust builds through transparency (showing reasoning) and track record (successful completions)
- "Adoption-compatible interaction rules are a prerequisite to improving human welfare"
- Start with low-stakes delegation (weather reminders, meeting summaries), build to high-stakes (financial decisions, communication on behalf)
6.3 The Right Interface
From Egocentric Co-Pilot (2026, WWW):
- Smart glasses + multimodal input (speech + gaze) enables "always-on assistive layer over daily life"
- But introduces latency requirements: cloud offloading adds ~200ms+ vs. on-device inference
- Hierarchical Context Compression supports long-horizon QA over continuous first-person video
From "The Next Paradigm" (2026):
- Desktop/mobile app with proactive notifications + on-demand deep interaction
- Device-cloud pipeline: lightweight on-device model for urgency triage, cloud model for complex reasoning
- User should control preference dials (proactivity level, domains to manage, notification frequency)
6.4 Human-in-the-Loop as a Tool
From 12-Factor Agents: "Contact humans with tool calls" — the agent should model human interaction exactly like any other tool:
contact_human(
channel="push_notification",
message="I found a cheaper flight for your trip to NYC next week. Should I rebook?",
urgency="medium",
timeout="4h",
fallback="keep_current_booking"
)
If the human doesn't respond within the timeout, the agent takes the fallback action. This makes human-in-the-loop a first-class, timeout-aware capability rather than a blocking dependency.
7. What Works & What Fails
7.1 What Works
| Category | Examples | Why It Works |
|---|---|---|
| Narrow, well-defined tasks | Email triage, meeting prep, expense categorization | Clear success criteria, bounded scope, verifiable output |
| Information synthesis | Morning briefings, research summaries, news digests | LLMs excel at summarization; failures are low-cost |
| Schedule optimization | Calendar management, reminder timing, travel planning | Structured data, clear constraints, measurable outcomes |
| Monitoring & alerting | Price tracking, deadline monitoring, health metric trends | Event-driven triggers, simple decision logic |
| Routine automation | Form filling, template generation, data entry | Repetitive tasks with clear patterns |
7.2 What Fails
| Category | Why It Fails | Reference |
|---|---|---|
| Open-ended goal pursuit | Compounding errors over long horizons; context degradation | AutoGPT postmortems |
| High-stakes autonomy | Users don't trust AI for financial/legal/medical decisions | Choose Your Agent (2026) |
| Social communication | Nuance, tone, relationship context are hard to model | EchoGuard (2026) |
| Real-time continuous sensing | Battery, latency, cost constraints for always-on processing | Rabbit R1, Humane AI Pin failures |
| Unbounded task decomposition | LLMs hallucinate sub-tasks, struggle with novel problem structures | ProActiveBench real-world data |
| Cross-modal understanding | Combining calendar + email + location + health data coherently | ContextAgent (2025) improvements still limited |
7.3 Critical Failure Patterns
-
The Infinite Loop: Agent creates task → fails → creates retry task → fails → ... Without proper stopping conditions, agents burn through budgets. Solution: circuit breakers, max iteration limits, exponential backoff.
-
Context Window Exhaustion: Long-running agents accumulate history. Eventually the context window fills and the agent "forgets" its goals. Solution: hierarchical memory with compression (Active Context Compression, 2026), event sourcing with materialized views (ESAA).
-
Preference Drift: User preferences change over time. An agent trained on January behavior may be wrong by March. Solution: continuous adaptation with recency weighting (O-Mem, ProPerSim).
-
The 80% Problem: From 12-Factor Agents — most agent frameworks get you to 80% quality, but the last 20% requires abandoning the framework and building custom. Solution: own your prompts, own your control flow, build on primitives not frameworks.
-
Notification Fatigue Spiral: Proactive agent sends too many suggestions → user ignores them → agent interprets silence as neutral → continues sending → user disables agent entirely. Solution: explicit feedback channels, decreasing proactivity in response to non-engagement.
8. Creative Use Cases for a Life Augmentation Agent
8.1 The "Morning Protocol" Agent
Every morning, the agent:
- Scans today's calendar, identifies preparation needs
- Checks email for anything requiring pre-meeting context
- Reviews weather + commute for schedule impacts
- Surfaces any overnight deadline changes or urgent messages
- Generates a personalized briefing pushed to phone/watch
Delegate agents: CalendarAgent, EmailTriageAgent, WeatherAgent, CommutePlanner
8.2 The "Background Research" Agent
User mentions interest in a topic (via conversation, browsing, or explicit request):
- Agent queues a deep research task
- ResearchAgent searches academic papers, blogs, news
- Runs over hours/days, building a knowledge graph
- Synthesizes findings into a briefing document
- Proactively surfaces when user next has free time
This is exactly the IntentRL (2026) pattern — clarify intent first, then execute autonomously.
8.3 The "Financial Health" Agent
Continuous monitoring:
- Tracks spending against budget categories
- Monitors bills for price increases or better alternatives
- Watches subscriptions for unused services
- Alerts on unusual charges
- Monthly financial health summary
Requires: bank API integration, category classification, anomaly detection
8.4 The "Social Maintenance" Agent
The hardest and most nuanced use case:
- Tracks last contact date with important people
- Suggests reaching out when it's been too long
- Notices birthdays, life events from social media
- Drafts (but never sends without approval) personal messages
- Manages RSVP tracking
Key constraint: never send on behalf without explicit approval. This is the "advisor" pattern from Choose Your Agent — suggest, don't act.
8.5 The "Health & Wellness" Agent
- Monitors sleep patterns (via wearable data)
- Tracks medication schedules
- Suggests exercise based on calendar gaps
- Monitors mood patterns from interaction data (very carefully, with explicit consent)
- Connects health data to productivity patterns
ProMemAssist (2025) provides the cognitive model: intervene when the user's working memory is overloaded, not when they're in flow state.
8.6 The "Travel Optimization" Agent
For frequent travelers:
- Monitors prices for upcoming trips
- Manages loyalty programs and point optimization
- Handles rebooking during disruptions
- Creates location-aware travel guides
- Manages expense reports post-trip
8.7 The "Knowledge Worker Copilot" Agent
Always-on background support:
- Monitors project deadlines across tools (Jira, GitHub, email)
- Identifies tasks that are blocked or at risk
- Pre-fetches context before meetings (related documents, recent changes)
- Suggests task prioritization based on deadlines, impact, and dependencies
- Creates end-of-week summaries
8.8 The "Home Automation Intelligence Layer"
Layer on top of smart home:
- Learns household patterns (wake times, preferences, routines)
- Proactively adjusts based on calendar (early meeting → earlier alarm)
- Coordinates grocery lists with meal planning
- Manages maintenance reminders (HVAC filters, car service)
- Integrates weather forecasts with heating/cooling optimization
9. Privacy, Security & Trust
9.1 The Privacy Paradox
AgentScope (2026, arXiv:2603.04902) evaluates contextual privacy across agentic workflows accessing calendars, email, and personal files. Life augmentation agents need deep access to personal data, but:
- Users want personalization (requires data) but fear surveillance
- Cross-domain data aggregation (calendar + health + finance) creates sensitive profiles
- Current LLM providers process data on external servers
9.2 User-Centric Architecture
From "The Next Paradigm" (2026):
- On-device processing for sensitive data (triage, classification)
- Cloud processing only for complex reasoning, with minimal data transmission
- User-controlled data policies: what data the agent can access, retain, and share
- Audit logs: every data access and decision is traceable (cf. ESAA event sourcing)
9.3 Trust Hierarchy
For a life agent, different actions require different trust levels:
| Trust Level | Actions | Control |
|---|---|---|
| Full autonomy | Reading calendar, checking weather, monitoring prices | No approval needed |
| Notify & act | Sending standard replies, rescheduling non-critical meetings | Notification + undo window |
| Ask & act | Sending personal messages, making purchases, changing plans | Explicit approval required |
| Never autonomous | Financial transactions above threshold, medical decisions, legal actions | Human must initiate |
10. Recommended Architecture
Based on all the research, here's a synthesized architecture for a life augmentation agent:
10.1 Core Components
┌─────────────────────────────────────────────────────┐
│ Life Agent Core │
│ │
│ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ Event Bus │──→│Orchestrator│──→│ Task Queue │ │
│ │(triggers) │ │ (planner) │ │(priority heap)│ │
│ └───────────┘ └────────────┘ └──────────────┘ │
│ ↑ │ │ │
│ │ ↓ ↓ │
│ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ User │ │ Memory │ │ Worker Pool │ │
│ │ Interface │←──│ (O-Mem + │ │ (sub-agents) │ │
│ │ (multi-ch)│ │ profiles) │ │ │ │
│ └───────────┘ └────────────┘ └──────────────┘ │
│ │ │ │
│ ↓ ↓ │
│ ┌────────────────────────────┐ │
│ │ Event Store (append-only) │ │
│ │ + State Snapshots │ │
│ └────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
10.2 Key Design Decisions
-
Event-sourced state (ESAA pattern): All agent actions recorded in append-only log. State reconstructable from events. Enables debugging, audit, and replay.
-
Stateless orchestrator (12-Factor #12): Each orchestrator invocation is a pure function of current state + new event. No hidden state. Enables horizontal scaling and crash recovery.
-
Priority-based task queue: Tasks have urgency, importance, and deadline. Orchestrator continuously re-prioritizes. Inspired by APEMO peak-aware scheduling.
-
Multi-channel input/output: Triggers from cron (morning briefing), webhooks (email arrival), user messages (chat), sensors (location change). Output via push notifications, email, chat, or ambient display.
-
Specialized worker agents: CalendarAgent, ResearchAgent, EmailAgent, FinanceAgent, HealthAgent. Each has narrow scope, specific tools, and measurable outcomes.
-
Adaptive proactivity: Use ProPerSim-style user modeling to learn intervention timing. Start conservative (suggestions only), build trust, increase autonomy over time.
-
O-Mem-style memory: Active user profiling with hierarchical retrieval. Short-term working memory (today's context), medium-term episodic memory (recent interactions), long-term semantic memory (stable preferences).
-
Human-as-a-tool: Human interaction modeled as a tool call with timeout and fallback (12-Factor #7). Never block on human response for time-sensitive tasks.
10.3 Technology Stack Recommendations
- Orchestration: Durable workflow engine (Temporal.io, Azure Durable Functions, or custom event-sourced loop)
- Agent Framework: Microsoft Agent Framework (for .NET) or LangGraph (for Python) — both support checkpointing and persistence
- Memory Store: Vector DB (Qdrant/Pinecone) for semantic retrieval + PostgreSQL for structured user profiles
- Event Store: Append-only log (PostgreSQL with immutable rows, or EventStoreDB)
- Task Queue: Redis-backed priority queue or cloud-native queue (Azure Service Bus, SQS)
- Notifications: Multi-channel push (FCM, APNs, email, Slack/Teams webhook)
- LLM: Tiered — fast/cheap model for triage/classification, powerful model for reasoning/planning
- Monitoring: OpenTelemetry for distributed tracing across agent invocations
10.4 Critical Engineering Principles
- Circuit breakers: Max iterations per task, budget limits per day, error thresholds
- Graceful degradation: If a worker agent fails, others continue operating
- Idempotent operations: Every action can be safely retried
- Transparent reasoning: Every decision should have a human-readable explanation
- Progressive trust: Start with low-autonomy defaults, increase based on track record
- Cost accounting: Track per-task LLM costs, optimize routing to smaller models where possible
- Offline-first: Agent should handle network interruptions gracefully with local queuing
11. Key Papers & References
11.1 Papers Downloaded & Converted (Batch 6)
| Paper | arXiv | Key Contribution |
|---|---|---|
| Proactive Agent | 2410.12361 | First systematic reactive→proactive shift; ProactiveBench |
| Ask-before-Plan | 2406.12639 | Proactive clarification + CEP framework (EMNLP 2024) |
| ContextAgent | 2505.14668 | Wearable sensory context for proactive agents (NeurIPS 2025) |
| ProAgent Sensory | 2512.06721 | End-to-end proactive system on AR glasses |
| ProPerSim | 2509.21730 | Proactive + personalized assistants (ICLR 2026) |
| BAO | 2602.11351 | Behavioral agentic optimization; autonomy-satisfaction Pareto |
| IntentRL | 2602.03468 | RL for proactive intent clarification in deep research |
| ProAgentBench | 2602.04482 | Real-world proactive assistance benchmark (28K+ events) |
| PROPER | 2601.09926 | Knowledge gap navigation for proactivity |
| O-Mem | 2511.13593 | Omni memory for personalized long-horizon agents |
| Personalized LLM Agents Survey | 2602.22680 | Comprehensive survey: profile, memory, planning, action |
| AMemGym | 2603.01966 | Interactive memory benchmarking for long-horizon (ICLR 2026) |
| ProMemAssist | 2507.21378 | Working memory modeling for proactive timing (UIST'25) |
| Egocentric Co-Pilot | 2603.01104 | Always-on smart glasses agent (WWW 2026) |
| Next Paradigm: User-Centric | 2602.15682 | User-centric agent vs platform-centric service |
| Choose Your Agent | 2602.12089 | Advisor vs Coach vs Delegate tradeoffs (N=243) |
| ESAA | 2602.23193 | Event sourcing architecture for autonomous agents |
| Alignment in Time | 2602.17910 | Peak-aware orchestration for long-horizon systems |
| Internet of Agentic AI | 2602.03145 | Distributed agent coalition formation |
| Declarative Agent Workflows | 2512.19769 | DSL for agent workflow orchestration (PayPal scale) |
11.2 Blog Posts & Practical Resources
| Resource | Key Takeaway |
|---|---|
| Anthropic: Building Effective Agents | Start simple, use orchestrator-workers for complex tasks, agents for open-ended problems |
| 12-Factor Agents | Stateless reducer pattern, pause/resume, human-as-tool, small focused agents |
| LangGraph Cloud | Task queues, cron, background jobs, time-travel debugging for long-running agents |
| Latent.Space: Anatomy of Autonomy | AutoGPT/BabyAGI dissection, 5 capability levels, self-driving car analogy |
| Lindy.ai | Production personal assistant — narrow scope + high reliability wins |
11.3 Papers Already in Collection (Relevant)
generative-agents-stanford-2023— Long-running agent simulation patternsvoyager-lifelong-learning-2023— Lifelong skill accumulationmemgpt-llm-operating-system-2023— OS-like memory management for agentscaveagent-stateful-runtime-2026— Stateful agent runtime with persistent containersariadnemem-lifelong-memory-2026— Lifelong episodic-semantic memoryanatomy-agentic-memory-2026— Comprehensive memory architecture taxonomyactive-context-compression-2026— Autonomous memory managementagentic-ai-frameworks-protocols-2025— Architecture & protocol landscape
12. User Sentiment & Real-World Validation
Added March 2026. Synthesized from 11 papers with real user studies plus web sources (Reddit, product reviews). Full analysis:
blueprints/life-agent/docs/user-sentiment-research.md.
12.1 The Preference-Performance Paradox
The single most important finding across all research: users systematically prefer the interaction mode that gives them worse outcomes.
Choose Your Agent (2026, N=243, 6,561 trading decisions):
- 44% preferred Advisor ("show me options, I'll decide")
- 19.3% preferred Delegate ("just do it for me")
- 21.4% preferred no AI at all
- Yet Delegate produced statistically better economic outcomes (β=0.084, p=.034)
- Users who preferred AI reported 20% higher mental effort than autonomy-seekers (p<.01) — they willingly accepted cognitive load for perceived control
This means life agent designers face a fundamental tension: the optimal interaction pattern (delegation) is the one users resist most. The solution is progressive trust — start with advisory, earn the right to delegate through demonstrated reliability.
12.2 The "Less Is More" Principle (Empirically Proven)
ProMemAssist (UIST 2025, N=12): delivered 60% fewer messages while achieving 2.6× higher positive engagement and lower frustration (NASA-TLX: 2.32 vs 3.14, p<.05). The mechanism: modeling the user's working memory load in real-time and deferring messages when cognitive capacity is low.
ProPerSim (ICLR 2026, 32 personas): the system started at 24 recommendations/hour and naturally converged to ~6/hour through preference learning. This emergent "learning to be quiet" behavior improved satisfaction from 2.2/4 to 3.3/4 over 14 days.
BAO (CMU/Salesforce/MIT): without behavior regularization, agent User Involvement Rate shoots from 0.21 to 0.91 — agents pester users 91% of the time. With regularization (information-seeking + over-thinking penalties), UR drops to 0.2148.
Design principle: Max 6 proactive notifications per hour. Decrease on non-engagement. The cost of a false positive (unnecessary interruption) exceeds the cost of a false negative (missed opportunity to assist).
12.3 Trust Fragility
Trust is asymmetric — one failure outweighs many successes:
- ProMemAssist P5 indicated a single unhelpful suggestion could break trust with the entire system
- Trust was influenced more by quality of assistance than timing of delivery
- Algorithm aversion — tendency to abandon AI after observing even small errors — significantly reduces adoption
- "Poorly calibrated proactivity can undermine user trust, agency, and interaction quality" (PROPER, 2026)
Design principle: Optimize all proactive features for precision over recall. Don't ship a feature that fails >20% of the time.
12.4 Real Product Outcomes
| Product | Result | Lesson |
|---|---|---|
| Lindy.ai | 400K+ users | Narrow scope + high reliability beats broad ambitions |
| Rabbit R1 | Failed basic tasks (wrong location for weather, CAPTCHA loops for restaurant booking) | Hardware + breadth of capability doesn't compensate for unreliable execution |
| AutoGPT/BabyAGI | Rapid context degradation after 10-20 steps | Unbounded autonomy without grounding fails catastrophically |
| Replit Agent | "Easy to build the prototype, deceptively hard to improve reliability" | The demo-to-production gap is enormous |
| Smart glasses (various) | RayNeo X2 Pro: ~20 min battery in continuous mode | Always-on hardware is bottlenecked by power, not by AI capability |
12.5 What Real Users Want (Direct Evidence)
Reddit (r/artificial, 2025) — a neurodivergent parent posted:
"I struggle with staying on top of daily tasks. I'd love to find an AI assistant that can: Talk to me, not just respond when I ask. Give me reminders and nudges, even when I'm distracted. Help manage tasks, routines, and my health needs (I'm autistic/ADHD). Adapt to my life as a parent."
No commenter had a real solution. Other Reddit threads (r/singularity) reported teams being cut from 50→30 people using AI agents, editorial staff of 17 laid off, software engineers shifting to "supervising AI" — but all in workplace automation, not personal life augmentation.
The gap: workplace AI agent adoption is accelerating; personal life agent adoption has no viable product yet.
12.6 Surprising Empirical Findings
- Delegation benefits non-users: non-users in Delegate groups had 21.6% higher surplus than baseline groups — AI improved outcomes for people who didn't even use it (market spillover)
- Introverts benefit most from personalized AI (larger improvement than extraverts in ProPerSim)
- Chain-of-thought hurts proactive timing: CoT prompting degrades performance on proactive prediction — recall dropped from 94.4% to 17.1% in one model (ProAgentBench)
- Humans are bursty, LLMs aren't: real interaction burstiness B=0.787 vs LLM-simulated B=0.166 (ProAgentBench) — any LLM-generated training data misses this entirely
- Knowing when > knowing what: timing prediction 64.4% accurate, content prediction maxes at 30.5% semantic similarity — solve timing first, content second
- Explicit feedback required: ProPerSim showed implicit signals alone offer "limited benefit" — the agent must ask for ratings, not just observe behavior
13. Audio Lifelogging & Continuous Transcription
Full research document:
knowledge-base/audio-lifelogging-research.md(14 sections, 17 academic papers, extensive product/community analysis)
Audio lifelogging — always-on recording with live transcription, speaker attribution, and knowledge extraction — is the richest possible input for a life augmentation agent. It captures commitments, decisions, context, and social interactions at the moment they happen, with zero user effort.
13.1 State of the Art (2024–2025)
The field has matured from academic concept to consumer product:
| Product/Tool | Form Factor | Status | Key Capability |
|---|---|---|---|
| say (u/8ta4) | macOS app + Electron | Open source, 2+ years daily use | Deepgram streaming → LLM structuring → queryable archive |
| Omi | BLE pendant (nRF/ESP32, ~$24) | Open source (MIT), 7.8k GitHub stars | Hardware + Flutter app + cloud pipeline |
| Limitless | Pendant | Acquired by Meta (2025) | Meeting transcription + speaker ID |
| Plaud.ai | Credit-card form factor | Shipping product | Business meeting transcription |
| Bee (Shopify) | Mobile app | Shipping product | Personal AI diary from conversations |
Key technical components: Deepgram nova-3 streaming ASR (~$0.0043/min), ECAPA-TDNN speaker embeddings (0.8% EER), Pyannote diarization (open source), Silero VAD (on-device filtering eliminates ~60% silence).
13.2 Critical Constraints
- iOS is not viable for always-on recording: No persistent background audio entitlement, orange indicator dot, unpredictable app killing. Apple Watch battery cannot sustain 24/7 mic. A dedicated BLE pendant is the only reliable architecture.
- Noisy environments: Transcription accuracy degrades significantly in crowds, showers, wind. No viable waterproof mic solution exists (IP67 housings kill mic sensitivity).
- Speaker attribution in the wild: ECAPA-TDNN's 0.8% EER is measured in controlled conditions. Ambient audio with overlapping speakers, distance, and background noise reduces accuracy. Enrollment UX (10-second samples per contact) adds onboarding friction.
- Privacy/legal: Recording laws vary by jurisdiction (one-party vs two-party consent). GDPR right to erasure conflicts with append-only architectures. Illinois BIPA applies to voiceprint biometrics. Privacy-by-design (transcribe → discard audio) is the minimum viable approach.
- Cost at scale: ~$1/day for Deepgram with heavy use. VAD pre-filtering is essential. Deepgram offers $200 free credit (~4 months runway).
13.3 Implications for Life Agents
Audio lifelogging transforms a life agent from reactive (user must type/speak commands) to truly ambient (agent passively absorbs context from all conversations):
- Episodic memory becomes conversational memory: The agent doesn't just remember tasks — it remembers what was said, by whom, in what context. "What did Alex say about the deadline?" becomes answerable.
- Commitment capture: Spoken "I need to..." or "Let's plan to..." automatically become tracked tasks (see scenario S60).
- Social context: The agent builds a knowledge graph of what different people care about, what they told you, and when. This enables social scenarios (gift suggestions, relationship maintenance) that were previously impossible without manual data entry.
- Cognitive offloading: The most consistent finding from daily users is that recording reduces cognitive load — people speak more freely when they know everything is captured and searchable.
13.4 Open Research Problems
- Waterproof/harsh-environment recording — no solution exists
- Real-time commitment/intent detection from ambient speech (high false-positive rate)
- Long-term conversational memory retrieval at scale (15M+ words/year)
- Multi-language speaker diarization (enrollment per language needed?)
- Emotional tone tracking over time (Pierce & Mann, 2021) — feasible but not yet integrated into any lifelogging product
- Consent UX — how to gracefully handle two-party consent jurisdictions in always-on recording
13.5 Key Academic References
- Vemuri et al. (2006) — iRemember: proactive information retrieval from personal audio
- Harvey et al. (2016) — Audio lifelogging for event summarization
- Hodges et al. (2006) — SenseCam visual lifelogs for memory augmentation
- Lee & Dey (2008) — Lifelogging memory appliance for automatic recording
- Pierce & Mann (2021) — Affective lifelogging with multimodal sensing
- Desplanques et al. (2020) — ECAPA-TDNN speaker verification
See knowledge-base/audio-lifelogging-research.md §4 for full paper list (17 papers).
Document created: March 2026 Papers downloaded: 20 new papers (Batch 6) Total papers in collection: 126 User sentiment research added: March 2026 (11 papers + web sources) Audio lifelogging research added: March 2026 (17 papers + product/community analysis)