Tech Stack

🌐 Front-End

Framework: React 18
- Plain-vanilla implementation to prioritize stability, simplicity, and maintainability.
Styling: Tailwind CSS
- Rapid styling and consistent theming across the app.
State Management: Redux Toolkit
- Efficiently synchronizes session states (open threads, token balances, LLM preferences) seamlessly across browser tabs, ensuring uninterrupted user experience.

🛠️ API & Runtime Layer

Runtime: Node.js (v20) on AWS Fargate
- Serverless container orchestration providing seamless scalability, isolated environments, and easy deployment.
Framework: NestJS
- Robust, modular API architecture with clean separation of domain concerns (authentication, chat handling, memory management, billing).
- OpenAPI decorators for automatic endpoint documentation, simplifying integration and maintenance.
Authentication: AWS Cognito + Google OAuth
- Secure, frictionless user authentication with seamless integration for Google Single Sign-On (SSO).

📦 Data, Search & Memory Management

Primary Data Storage:

AWS Aurora PostgreSQL
- Secure, relational database holding structured user profiles, subscription data, and persistent Personal Profile JSON objects (user’s tone, goals, communication preferences).

Message Storage & Summarization:

AWS DynamoDB
- High-performance NoSQL storage for raw chat messages and rolling thread summaries, ensuring low latency and fast retrieval.

Search Layer:

AWS OpenSearch
- Real-time indexing for lightning-fast retrieval across extensive chat history.

Advanced Memory Layers:

🧠 Thread Memory
- Each thread maintains editable metadata (name, personality, context), instantly reflected in the active conversation, granting full user control over conversational context.
💧 Hydrate Thread
- Intelligent automatic or manual summarization triggered when approaching context limits. A fresh summary seamlessly transitions into a new session, maintaining uninterrupted continuity.

🤖 LLM Integration & Vendor Flexibility

Supported LLMs: OpenAI GPT-4o Mini, Anthropic Claude 3 Haiku, DeepSeek R1
Image Generation: OpenAI DALL·E 3
Unified Client Layer:
- Centralized LLM interaction module with built-in retry logic, exponential back-off, and token usage tracking, providing robust, consistent integration regardless of vendor choice.
Per-Thread LLM Selection:
- Thread-specific model selection to maintain continuous context and tone, even when switching LLM vendors mid-project.

🔄 Integrations & Automation

Payments: Stripe
- Subscription handling and real-time token top-ups through webhook integration with NestJS backend.
CRM & Marketing Automation: HubSpot
- Marketing outreach sequences and automated nightly synchronization of usage data through a lightweight AWS Lambda function.

⚙️ Core Alpha Features

🔐 Login & Authentication: Email/password & Google OAuth sign-in
💬 Dynamic Chat: Interactive conversations with real-time token usage and cost estimates
🧵 Personal Thread: Dedicated workspace pre-seeded from user profile
📚 Prompt Library: Save, organize, and quickly reuse custom prompts via keyword triggers
🗂️ Thread Memory: Editable per-thread metadata (name, tone, context) providing persistent, personalized context
🌊 Hydrate Thread: Intelligent one-click summarization and context refresh, preserving thread continuity without interruption

🚦 Environments & DevOps

CI/CD Pipeline: AWS CodePipeline + AWS CodeBuild
- Automated linting, unit testing, and deployment to distinct development and production Fargate clusters.
Network Isolation: AWS Virtual Private Cloud (VPC)
- Separate, secure environments for development and production instances.
DNS & Security: AWS Route 53, AWS Web Application Firewall (WAF)
- Domain management and security against common web exploits.
Observability & Monitoring: AWS CloudWatch, AWS X-Ray
- Comprehensive logging, metrics monitoring, request tracing, and alerting with automatic escalation on performance degradation or integration errors.

🤝 Future Roadmap: Task-Specific Agents

Email Agent (Upcoming Feature)
- Drafts intelligent email replies and manages document summarization, seamlessly integrated within existing memory and billing frameworks.
Scalable Agent Framework
- Modular agent architecture leveraging existing memory infrastructure, allowing rapid deployment of future task-specific agents without re-architecture.

🛡️ Why This Architecture?

Our architectural decisions prioritize simplicity, stability, and scalability:

Minimally Complex Infrastructure: Familiar and mature technologies (React, NestJS, AWS Managed Services) allow rapid feature delivery.
Cost-Effective Scalability: Serverless architecture keeps costs predictable, making affordable monthly pricing viable for all users.
User-Focused Innovation: Maximizing engineering resources toward distinctive HiiBo features—robust memory, vendor flexibility, transparent billing, and sustainable computing practices.

This streamlined approach gives us ample flexibility and capacity to continuously innovate, deliver compelling user experiences, and scale smoothly into future AI enhancements and agent capabilities.