How can I make my research assistant textable?
Creating a textable AI research assistant involves making an LLM-powered agent accessible through common messaging channels such as SMS, WhatsApp, RCS...
Author’s note: https://www.bryanwhiting.com/ideas/parallel-ai-deep-researcher/
Executive Summary
Creating a textable AI research assistant involves making an LLM-powered agent accessible through common messaging channels such as SMS, WhatsApp, RCS, Telegram, and Slack. This approach allows users to initiate research requests, receive progress updates, and get synthesized results directly on their mobile devices without needing a dedicated application or browser. The primary benefit of this model is its ubiquity, as users already have these messaging apps installed. Furthermore, channels like WhatsApp and RCS offer richer user experiences with features like formatted text, images, and interactive cards, while WhatsApp also provides end-to-end encryption for enhanced privacy. The core architecture for such a system relies on several key components. First, provider webhooks (from services like Twilio for SMS, Meta for WhatsApp, or Google for RCS) are used to receive incoming user messages at a designated endpoint. These messages are then normalized into a common internal format and placed into an asynchronous job queue. This decouples the immediate response from the longer research process, allowing the system to send an initial acknowledgment, followed by periodic progress updates, and finally the complete research findings. The research itself is performed by an LLM agent equipped with tools for tasks like web browsing, document retrieval, and data synthesis, often using Retrieval-Augmented Generation (RAG) to access private knowledge bases and provide citations. Finally, the results are delivered back to the user, formatted according to the specific capabilities and constraints of the original messaging channel.
Core Architectural Blueprint
The recommended technical architecture for a textable research assistant is a decoupled, asynchronous system designed to handle long-running tasks initiated via messaging platforms. The blueprint begins with an ingress layer where messaging gateways (e.g., Twilio for SMS, Meta for WhatsApp, Google for RCS) receive user messages and forward them via webhooks to a centralized application endpoint. Upon receipt, the application normalizes the message into a common internal format and places it into a message queue. This immediately decouples the message ingestion from the processing, allowing the application to send a quick acknowledgment back to the user (e.g., ‘Got it—starting the research’). Asynchronous job workers continuously pull tasks from this queue. Each worker is responsible for executing the core research logic, which is managed by an LLM orchestration framework like OpenAI Assistants, LlamaIndex, or CrewAI. This framework equips an LLM agent with tools for tasks like live web browsing, database lookups, and Retrieval-Augmented Generation (RAG) from private data sources. For complex queries, the agent can decompose the task into parallel sub-tasks. Throughout this process, the worker can send progress updates back to the user via the messaging gateway’s API. Once the research is complete and the final answer is synthesized, the worker formats the result according to the specific constraints and capabilities of the original messaging channel (e.g., splitting long text for SMS, using rich cards for RCS) and delivers it to the user. Conversation state and user memory are persisted in a database to ensure context is maintained across interactions.
Implementation Roadmap
This roadmap provides a step-by-step guide to building and deploying a textable AI research assistant.
Step 1: Select and Onboard a Messaging Channel Choose an initial channel based on your target audience’s preferences, security needs, and desired user experience.
- SMS/MMS: Use a provider like Twilio Programmable Messaging. This requires purchasing a phone number, configuring a webhook to receive messages, and complying with regulations like A2P 10DLC in the U.S., which involves a registration process and associated fees. SMS has high reach but limited content richness.
- WhatsApp: Utilize the WhatsApp Business Cloud API provided by Meta. This involves creating a Meta developer app, setting up a phone number, and configuring a webhook to receive message events. You must adhere to Meta’s policies, particularly regarding business-initiated ‘template messages’.
- RCS Business Messaging (RBM): Register as a partner with Google, create a branded agent, and configure a webhook. RBM supports rich content like cards and carousels but requires a carrier/Google-managed launch and brand verification process.
- Telegram: Create a bot using the ‘@BotFather’ tool within Telegram. You can then receive updates via a webhook or long polling and send messages using the Bot API, which is free to use.
- Slack: Create a Slack app and a bot user. Use the Events API to subscribe to messages and slash commands, and the Web API to send responses. Note that legacy bot users have some API limitations.
Step 2: Build the Core Ingress and Orchestration Logic This is the backbone of the assistant, handling incoming requests and managing tasks.
- Webhook Endpoint: Create a web server (e.g., using Flask in Python) with an endpoint that your chosen channel provider will POST messages to.
- Message Normalization: In your webhook handler, parse the incoming payload (which varies by provider) and extract key information like the sender’s ID and the message body into a standardized internal format.
- Asynchronous Job Queue: Instead of processing the research request directly in the webhook, place the normalized request into a job queue (e.g., Python’s
Queue, RabbitMQ, or Redis). This prevents the webhook from timing out and allows for long-running tasks. - Immediate Acknowledgment: Immediately after enqueuing the job, send a reply to the user acknowledging their request (e.g., “Got it—starting the research. I’ll text you updates.”).
Step 3: Develop the AI Research Agent This component performs the actual research and synthesis.
- Choose an Orchestration Framework: Select a framework to manage the LLM, tools, and workflow. Options include OpenAI’s Assistants API (for built-in tool use and retrieval), Anthropic’s Messages API with tool use, or more complex multi-agent frameworks like LlamaIndex, CrewAI, or AutoGen for parallel tasks and advanced workflows.
- Implement Tools: Equip your agent with necessary tools, such as a web search API, a document retriever for RAG, or other custom functions.
- Define Agent Logic: A background worker process will pull jobs from the queue. For each job, it will invoke the research agent with the user’s query. The agent will use its tools to gather information and then use the LLM to synthesize a response.
- Manage State and Memory: Use a database to store conversation history and user state, allowing the agent to have context for follow-up questions.
Step 4: Design the Conversational User Experience Since research tasks are not instantaneous, the UX must manage user expectations.
- Define a Conversation Contract: Establish a clear interaction pattern: 1) User sends query, 2) Assistant sends immediate acknowledgment, 3) Assistant sends periodic status updates (“Searching sources…”, “Synthesizing findings…”), 4) Assistant delivers the final, chunked results with citations.
- Chunk Large Answers: Respect channel limitations. For SMS, which has a 160-character segment limit, split long answers into multiple messages. For richer channels like WhatsApp or RCS, use formatted messages or link to a hosted web page with the full report.
- Handle Opt-Outs: Implement logic to handle commands like “STOP” to comply with regulations like the TCPA, immediately ceasing messages to that user.
Step 5: Deploy, Observe, and Scale
- Start with a Minimal Build: Begin with a simple SMS bot that follows the webhook -> queue -> agent -> response loop. The provided Python Flask and worker pseudo-code is a good starting point.
- Integrate Observability: Use a platform like Langfuse to trace every step of the process, including LLM calls, tool usage, retrieval steps, latency, and costs. This is crucial for debugging, evaluation, and optimization.
- Scale and Expand: Once the core loop is stable, add more features like RAG for memory, parallel workflows for deeper research, and expand to other messaging channels.
Data Retrieval And Rag Strategy
The recommended strategy for data retrieval is centered on Retrieval-Augmented Generation (RAG), which is identified as a core technique for building data-backed LLM applications. This approach grounds the assistant’s responses in factual data, enabling it to perform deep research and provide citations. The implementation involves augmenting the LLM agent with a set of tools, including functions for live web browsing (web search) and retrieving information from a knowledge base. This knowledge base is typically built by ingesting and indexing documents (e.g., private corpora, previous research) into a vector database. When a user asks a question, the system first retrieves relevant chunks of text from this database and the web. These retrieved documents are then passed to the LLM along with the original query as context, allowing the model to synthesize an accurate, detailed, and cited answer. Frameworks like LlamaIndex are highlighted for their powerful capabilities in building agentic RAG applications, including routing, reflection, and parallelism. Similarly, the OpenAI Assistants API offers built-in retrieval tools. The strategy also includes cost management techniques such as optimizing RAG chunking, using reranking models to improve the relevance of retrieved documents, and summarizing information before feeding it to the final generation model.
Nlp And Ux Design Considerations
Building a text-based research assistant requires careful consideration of both Natural Language Processing (NLP) and User Experience (UX) to handle the unique constraints of messaging channels. A key practice is to manage user intent and long-running tasks asynchronously. This involves using a webhook to receive a user’s query, placing it into a job queue, and immediately sending an acknowledgment message. A background worker then processes the research task using an LLM agent capable of tool use and reasoning. To manage conversational state over stateless protocols like SMS, it’s essential to maintain state in a database and use frameworks with memory capabilities. For the user experience, a clear ‘conversation contract’ should be established: an initial acknowledgment, followed by progress updates (e.g., ‘searching,’ ‘synthesizing’), and finally, the delivery of the results. A critical constraint, especially for SMS, is the character limit. Large answers must be chunked into multiple, ordered message segments or delivered as a link to a hosted report. In contrast, richer channels like RCS or WhatsApp can support more complex message types. Error handling, including retries for failed operations, should be built into the orchestration logic. Furthermore, compliance and user control are paramount; the system must provide clear instructions for opting out (e.g., ‘Reply STOP to opt out’) and honor these requests immediately to comply with regulations like the TCPA.
Security And Privacy Best Practices
Implementing a textable research assistant requires a robust approach to security and privacy, acknowledging the inherent limitations of certain messaging channels. A primary concern is the lack of end-to-end encryption (E2EE) on standard SMS. While messages are transmitted over carrier networks, they are not typically encrypted at the application layer in the same way as messages on platforms like WhatsApp. This means sensitive information should never be sent over SMS. For interactions requiring higher privacy, it is best to use channels that explicitly support E2EE. Best practices include: 1) Data Minimization: Collect and process only the personal data that is absolutely necessary for the service to function. 2) Secure Handling: Protect personal data by applying retention controls to automatically delete data after a certain period and redacting sensitive fields from logs to prevent accidental exposure. 3) Encryption and Safeguards: Align all data processing with a lawful basis, such as user consent under GDPR, and implement appropriate safeguards like encryption for data at rest and pseudonymization where possible. 4) Channel-Specific Design: Design the user experience based on the security characteristics of the channel. For SMS, this might mean sending links to a secure web portal for sensitive results rather than sending the results directly in the text message.
Cost Management Strategies
Effectively managing operational costs is crucial for a scalable textable research assistant, with expenses arising from both the messaging provider and the underlying Large Language Model (LLM) APIs. Actionable strategies include: 1) LLM Cost Reduction: Implement techniques to minimize token usage. This can be achieved through Retrieval-Augmented Generation (RAG) by using efficient chunking and reranking methods to provide the LLM with only the most relevant context. Summarization can condense large documents before they are processed by the main LLM. Implementing a caching layer for frequent or identical queries can also significantly reduce redundant API calls. 2) Messaging Cost Control: SMS pricing is often calculated per message segment. Long responses can become expensive as they are split into multiple segments. To manage this, large answers should be broken down into a series of smaller, coherent messages. Alternatively, for very long reports, the assistant can send a summary via SMS with a link to a securely hosted webpage containing the full results. 3) Observability and Monitoring: Utilize observability platforms to trace all LLM and tool calls, tracking the cost and latency of each interaction. This detailed monitoring allows for the identification of expensive operations and provides the data needed to optimize prompts, workflows, and tool usage for better cost-efficiency.
Key Challenges And Mitigations
Building and operating a textable AI assistant involves several key challenges across compliance, security, cost, and technical implementation. Understanding and mitigating these is crucial for a successful deployment.
1. Compliance, Consent, and Policy
- Challenge: Adhering to telecommunication regulations, particularly the Telephone Consumer Protection Act (TCPA) in the U.S., which governs automated text messages.
- Mitigation: Obtain prior express written consent from users before sending them automated or telemarketing messages. Implement a clear and easy way for users to opt out (e.g., by replying “STOP”), and ensure your system honors these revocations of consent immediately and in any reasonable manner they are communicated.
- Challenge: Navigating the A2P 10DLC (Application-to-Person 10-Digit Long Code) registration requirements for sending SMS messages in the U.S.
- Mitigation: Use a provider like Twilio that facilitates the A2P 10DLC registration process. This involves registering your brand and campaign, which is necessary for message deliverability and avoiding filtering. Be prepared for associated registration and onboarding fees.
- Challenge: Complying with platform-specific policies, such as WhatsApp’s rules on business-initiated template messages or RCS’s brand verification and agent launch approval process.
- Mitigation: Thoroughly review and follow the documentation for each channel. For WhatsApp, design approved message templates for initiating conversations. For RCS, factor the brand verification and approval timeline into your project plan.
- Challenge: Ensuring a lawful basis for processing personal data under regulations like GDPR.
- Mitigation: Establish and document a valid lawful basis for data processing, such as user consent or legitimate interests. Implement appropriate safeguards like encryption and pseudonymization, especially when handling sensitive data.
2. Security and Privacy
- Challenge: Messaging channels have different security characteristics. Standard SMS is transmitted over carrier networks and is not typically end-to-end encrypted at the application layer, whereas WhatsApp provides this feature.
- Mitigation: Choose channels based on the sensitivity of the data your assistant will handle. For interactions requiring higher privacy, guide users towards WhatsApp. Avoid transmitting highly sensitive personal or financial information over standard SMS. Minimize the personal data you collect and process, and redact sensitive fields before logging for analytics or debugging.
3. Cost Management
- Challenge: SMS pricing can be complex and accumulate quickly. Costs are typically calculated per message segment (usually 160 characters for standard SMS), with additional carrier fees that vary by destination.
- Mitigation: Monitor your messaging costs closely. Use your provider’s pricing calculators and be aware of volume discounts. Design your assistant to send concise messages or link to external pages for longer content to minimize the number of segments per response.
- Challenge: LLM API calls can be a significant operational expense, especially for complex research tasks.
- Mitigation: Employ cost-reduction strategies such as implementing a caching layer for frequent queries. Optimize Retrieval-Augmented Generation (RAG) by using efficient document chunking and reranking to minimize the context sent to the LLM. Use smaller, cheaper models for simpler tasks like classification or summarization where possible.
4. Technical and UX Challenges
- Challenge: Managing state and long-running tasks over stateless webhook interactions.
- Mitigation: Implement an asynchronous architecture using a job queue. When a webhook is received, the task is added to a queue and a separate worker process handles the long-running job. State is maintained in a database, not in the webhook process itself.
- Challenge: Debugging and evaluating the performance of a complex, multi-step AI system.
- Mitigation: Integrate an observability platform (e.g., Langfuse) from the start. This allows you to trace the entire lifecycle of a request—from webhook to tool calls to the final LLM response—and monitor key metrics like cost, latency, and output quality. Use evaluations like LLM-as-a-judge or user feedback to continuously improve prompt and agent performance.
Other Ideas