Avery
Back to Blog

Building an AI Assistant That Actually Reaches Out: The Technical Story

August 25, 2025 3 min read
Avery technical architecture diagram

The Fundamental Problem

Most AI assistants wait. You ask, they respond. The conversation ends until you remember to start it again. But human productivity doesn't work that way. We forget, we get distracted, we need someone to tap us on the shoulder at exactly the right moment.

Building a proactive AI assistant meant solving a problem that existing tools barely attempted: how do you make software that reaches out intelligently without becoming annoying?

The technical challenge wasn't just making AI smarter - it was orchestrating when and how it should speak first.


Rethinking the Conversation Architecture

Traditional chatbots follow a simple pattern: receive message, process, respond. But proactive assistance requires a fundamentally different architecture. Instead of single exchanges, we needed to model ongoing relationships where the AI maintains context, tracks commitments, and understands the rhythm of each user's life.

The breakthrough came with OpenAI's tool calling capabilities. Rather than trying to cram everything into prompts, we could give the AI discrete functions: create tasks, search memories, check calendars, update goals. This modular approach meant the AI could take specific actions rather than just generating text that we'd have to parse and interpret.

The Tool System Philosophy:

Each tool represents a capability boundary. The AI can query tasks with complex filters, but it can't directly access the database. It can schedule calendar events, but only through the Google Calendar API. This constraint-based design prevents the unpredictable failures that plague pure language model integrations.

Tools also solve the context problem elegantly. Instead of maintaining massive conversation histories, we give the AI tools to retrieve exactly the information it needs: recent tasks, upcoming events, relevant memories. The context stays fresh and relevant without hitting token limits.


WhatsApp as the Interface Layer

We chose WhatsApp not for convenience, but because it solved the notification problem that kills most productivity apps. People actually read WhatsApp messages. They don't ignore them the way they do app notifications or emails.

But WhatsApp Business API introduced immediate technical constraints that shaped the entire system. The platform has two distinct message types: regular messages that can only be sent within 24 hours of a user's last message, and template messages that require pre-approval from Meta but can be sent anytime.

This constraint forced us to think carefully about message timing and content. Proactive reminders often happen outside the 24-hour window, so we had to design template messages that felt natural while meeting Meta's approval requirements. Every proactive message template became a carefully crafted balance between personalization and compliance.

The Webhook Speed Problem:

WhatsApp requires webhook responses within five seconds. This sounds simple until you realize that processing an AI conversation with tool calls can take 30+ seconds. Our solution split the process: respond immediately to WhatsApp while processing the actual conversation asynchronously. Users see typing indicators while the real work happens in the background.

This architectural decision rippled through the entire system. Every component had to be built with eventual consistency in mind. Message ordering, tool execution, and response delivery all needed to handle asynchronous processing gracefully.


The Mathematics of Not Being Annoying

The core algorithmic challenge was frequency. Too many messages and users disable notifications. Too few and the system loses effectiveness. We needed a mathematical approach that adapted to each user's engagement patterns.

The solution was a decay algorithm based on exponential backoff. Every time Avery sends a message without getting a response, the next message gets delayed longer. The formula multiplies the base waiting period by 1.6 raised to the power of consecutive follow-ups. One unanswered message means waiting 3 hours for the next. Two unanswered messages means nearly 5 hours. Three pushes it to 8 hours.

Why 1.6? We tested various multipliers and found that 1.6 strikes the right balance. Lower values feel pushy. Higher values create gaps so long that the system feels broken. The exponential curve means engaged users get frequent helpful check-ins, while users who need space get progressively more breathing room.

Quiet Hours Enforcement:

The decay algorithm alone wasn't enough. We needed to respect when people don't want to be interrupted. Default quiet hours run from 9 PM to 8 AM in each user's timezone, but users can customize this window. The system automatically adjusts all proactive messaging to respect these boundaries, shifting messages to appropriate times rather than skipping them entirely.


Context Without Memory Explosion

AI assistants need context to be helpful, but conversation histories grow quickly. Storing every message exchange creates token limit problems and slows down response times. Our solution was layered context: different types of information with different retention and retrieval strategies.

Recent Context: The last 20 tasks, 10 upcoming calendar events, and 15 recent memories get included in every conversation. This gives the AI immediate context for current discussions.

Consolidated Memories: Instead of storing raw conversation fragments, we extract and consolidate facts into persistent memories. "Taylor prefers morning meetings" becomes a structured preference rather than scattered conversation references.

Dynamic Context Assembly: Each conversation gets a fresh system prompt built from the user's current state. Recent tasks, weather, calendar events, and personal preferences get combined into a coherent picture of what the AI needs to know right now.

This approach means the AI always has relevant context without drowning in historical data. It can reference your preferences from months ago while staying current with today's schedule.


Recurring Tasks and Time Complexity

"Remind me to water the plants every Tuesday at 3 PM" sounds simple until you realize the computational complexity. Tuesday changes every week. The user might be traveling. They might have changed time zones. The reminder needs to be intelligent about edge cases.

We implemented RRULE (Recurrence Rule) parsing to handle the full spectrum of recurring patterns. The system understands "every weekday," "the first Monday of each month," and "every three weeks on Tuesday and Thursday." Each pattern gets converted into a mathematical rule that can generate future occurrences reliably.

The Scheduling Engine:

A background job runs every minute to check which recurring tasks are due. But rather than just firing reminders mechanically, it respects quiet hours, checks for calendar conflicts, and adapts to user availability. The system maintains the original schedule time to prevent drift while being intelligent about actual delivery.


Google Calendar Integration Philosophy

Calendar integration wasn't just about reading events - it was about understanding availability and context. When should Avery suggest tackling that big project? When is the user likely to have mental energy for creative work versus administrative tasks?

The system syncs calendar events hourly but processes them for patterns. Meetings clustered in the morning suggest afternoons might be better for deep work. Regular weekly calls indicate routine availability windows. Travel events trigger automatic task rescheduling.

Token Refresh Complexity:

Google OAuth tokens expire every hour, creating a reliability problem. The system needed automatic token refresh that works seamlessly across all calendar operations. This meant building robust error handling and retry logic into every calendar interaction.


The Subscription Model Technical Requirements

Building a 15-day trial system sounds straightforward until you realize the edge cases. What happens when a trial expires mid-conversation? How do you handle promo codes that extend trials? What about failed payment processing during subscription creation?

We chose Stripe for billing because it handles the compliance complexity of subscription management. But integrating subscription state with AI conversation flow required careful coordination. The AI needs to know subscription status for feature availability, but it shouldn't constantly mention billing issues in conversation.

Promo Code Architecture:

Five-character alphanumeric codes that extend trial periods became a simple but effective growth tool. The technical implementation required tracking code usage, preventing multiple applications of the same code, and gracefully extending trial periods without disrupting the user experience.


Production Lessons and Reliability Patterns

Webhooks Fail Silently: WhatsApp webhooks occasionally drop messages with no error indication. We built reconciliation systems that detect and recover from these failures.

AI Models Have Bad Days: Sometimes OpenAI returns malformed responses or times out unexpectedly. Graceful degradation means falling back to simpler acknowledgments while queuing proper processing for retry.

Users Change Timezones: Frequent travelers create edge cases in scheduling logic. The system needed to detect timezone changes and automatically adjust all future scheduled tasks and quiet hours.

Template Message Approval Takes Time: Meta's template approval process can take days. We learned to design flexible templates that handle multiple use cases rather than creating highly specific ones that might get rejected.


The Technical Philosophy That Emerged

Building Avery taught us that conversational AI systems succeed or fail based on their orchestration layer, not just their language model capabilities. The most sophisticated AI is useless if it interrupts you at the wrong time or forgets context between conversations.

Tool-First Design: Rather than prompt engineering everything, give AI discrete capabilities with clear boundaries. This creates more reliable behavior and easier debugging.

Respect-First Architecture: Build quiet hours, decay algorithms, and user preferences into the core system, not as afterthoughts. Respectful software gets used; pushy software gets deleted.

Context Over History: Selective, relevant context works better than comprehensive message storage. What matters is what the AI needs to know now, not everything it's ever learned.

Reliability Over Intelligence: A feature that works predictably beats one that's occasionally brilliant but often confusing. Users need to trust the system's behavior patterns.

The technical challenges of proactive AI aren't just about making models smarter. They're about building systems that understand timing, context, and respect - the same qualities that make human assistants effective.

Ready to experience these technical decisions in daily practice? Try Avery today.

See these ideas in practice

Avery is the proactive AI assistant built on everything written here. It lives on WhatsApp and reaches out to you—not the other way around.

Get started with Google