Skip to content

System Overview

This document provides a comprehensive overview of AutoMem’s internal architecture, design decisions, and component interactions. It covers the Flask application structure, service initialization, request processing flow, and the coordination between storage systems and background workers.

For detailed information about specific components:


AutoMem implements three fundamental architectural patterns:

FalkorDB serves as the canonical data store with authoritative memory records, while Qdrant provides optional semantic search capabilities. All memory writes commit to FalkorDB first; Qdrant failures do not block operations.

This design ensures:

  • Graph operations always succeed regardless of vector store availability
  • Built-in redundancy for disaster recovery
  • Graceful degradation to graph-only mode when Qdrant is unavailable

Memory storage returns immediately to clients while enrichment processes asynchronously. This prevents blocking on:

  • Entity extraction via spaCy NLP
  • Embedding generation via external APIs
  • Pattern detection across the memory graph
  • Relationship creation between memories

The embedding system uses a provider abstraction that enables automatic fallback and explicit provider selection.

Implementations: VoyageEmbeddingProvider, OpenAIEmbeddingProvider, FastEmbedProvider, OllamaEmbeddingProvider, PlaceholderEmbeddingProvider.


graph TB
    subgraph "API Layer"
        POST["/memory POST endpoint"]
    end

    subgraph "Event-Driven Systems"
        EQ["enrichment_queue<br/>(Queue)"]
        EW["enrichment_worker()<br/>Thread"]

        EmQ["embedding_queue<br/>(Queue)"]
        EmW["embedding_worker()<br/>Thread"]
        Batch["Batch Accumulator<br/>20 items / 2s timeout"]
    end

    subgraph "Scheduled System"
        Sched["ConsolidationScheduler"]
        Consol["MemoryConsolidator"]

        Decay["decay task<br/>hourly"]
        Creative["creative task<br/>hourly"]
        Cluster["cluster task<br/>6 hours"]
        Forget["forget task<br/>daily"]
    end

    subgraph "Data Stores"
        Falkor[("FalkorDB<br/>Graph")]
        Qdrant[("Qdrant<br/>Vectors")]
    end

    subgraph "External Services"
        OpenAI["OpenAI API<br/>embeddings + classification"]
        Spacy["spaCy NLP<br/>entity extraction"]
    end

    POST -->|"enqueue(memory_id)"| EQ
    POST -->|"enqueue(memory_id)"| EmQ
    POST -->|"immediate write"| Falkor

    EQ --> EW
    EW -->|"extract entities"| Spacy
    EW -->|"create relationships"| Falkor
    EW -->|"similarity search"| Qdrant

    EmQ --> EmW
    EmW --> Batch
    Batch -->|"bulk generate"| OpenAI
    OpenAI -->|"store vectors"| Qdrant

    Sched -->|"check interval"| Decay
    Sched -->|"check interval"| Creative
    Sched -->|"check interval"| Cluster
    Sched -->|"check interval"| Forget

    Decay --> Consol
    Creative --> Consol
    Cluster --> Consol
    Forget --> Consol

    Consol -->|"update scores"| Falkor
    Consol -->|"create edges"| Falkor
    Consol -->|"delete points"| Qdrant

graph TB
    subgraph Internet["Public Internet"]
        Client["AI Client<br/>(Claude, Cursor, API)"]
    end

    subgraph Railway["Railway Project<br/>(your-project.railway.app)"]
        subgraph MemSvc["memory-service<br/>(Flask Container)"]
            Flask["app.py<br/>Port: 8001<br/>Bind: :: (IPv6)"]
            Workers["Background Workers<br/>- EnrichmentWorker<br/>- EmbeddingWorker<br/>- ConsolidationScheduler"]
        end

        subgraph FalkorSvc["FalkorDB<br/>(Docker Image)"]
            Falkor["falkordb/falkordb:latest<br/>Port: 6379"]
            Vol["Persistent Volume<br/>/var/lib/falkordb/data"]
        end

        subgraph MCPSvc["automem-mcp-sse<br/>(Optional Node.js)"]
            SSE["mcp-sse-server/server.js<br/>Port: 8080"]
        end

        DNS["RAILWAY_PRIVATE_DOMAIN<br/>memory-service.railway.internal"]
        PubDomain["Generated Public Domain<br/>automem-prod-xyz.up.railway.app"]
    end

    subgraph External["External Services"]
        QCloud["Qdrant Cloud<br/>(Optional)"]
        OpenAI["OpenAI API<br/>Embeddings"]
    end

    Client -->|HTTPS| PubDomain
    PubDomain --> Flask
    SSE -->|HTTP :8001| DNS
    DNS --> Flask

    Flask -->|Redis Protocol :6379| Falkor
    Falkor --> Vol
    Flask -.->|Vector Search| QCloud
    Flask -->|Embeddings| OpenAI
    Workers -->|Graph Updates| Falkor

The ServiceState dataclass (app.py:1093-1122) serves as the central state container for all service components.

This single state instance (app.py:1124) is shared across all Flask request handlers and background threads, requiring careful lock management for queue operations.


The Flask application initializes services in a specific order to ensure dependencies are available:

sequenceDiagram
    participant Flask as Flask App
    participant DB as Database Clients
    participant EW as Enrichment Worker
    participant EmW as Embedding Worker
    participant CS as Consolidation Scheduler

    Flask->>DB: Initialize FalkorDB client
    Flask->>DB: Initialize Qdrant client (optional)
    Flask->>Flask: Create consolidator instance

    Flask->>EW: Thread(target=enrichment_worker, daemon=True).start()
    activate EW
    EW->>EW: Enter infinite loop

    Flask->>EmW: Thread(target=embedding_worker, daemon=True).start()
    activate EmW
    EmW->>EmW: Enter infinite loop

    Flask->>CS: Initialize ConsolidationScheduler
    Flask->>CS: scheduler.add_job(...) for each task
    Flask->>CS: scheduler.start()
    activate CS
    CS->>CS: Begin periodic checks

    Flask->>Flask: Ready to accept requests

    Note over EW,EmW: Workers poll queues continuously
    Note over CS: Scheduler wakes on intervals
FunctionPurposeLocation
init_db_connections()Establishes FalkorDB and optional Qdrant connectionsapp.py:1338-1441
init_openai()Initializes OpenAI client for memory classificationapp.py:1179-1200
init_embedding_provider()Selects and initializes embedding providerapp.py:1202-1337
start_enrichment_worker()Launches entity extraction and linking pipelineapp.py:1728-1966
start_embedding_worker()Launches batch embedding generation workerapp.py:1968-2097
start_consolidation_worker()Launches scheduled consolidation cyclesconsolidation.py
start_sync_worker()Launches drift detection and repair workerapp.py:2099-2234

The POST /memory request flows through synchronous and asynchronous phases:

Recall implements hybrid search combining vector similarity, keyword matching, and graph traversal.

The _compute_metadata_score() function (automem/utils/scoring.py) combines signals with configurable weights:

ComponentWeightSourceConfig Variable
Vector similarity25%Qdrant cosine distanceSEARCH_WEIGHT_VECTOR
Keyword match15%TF-IDF from graph querySEARCH_WEIGHT_KEYWORD
Relationship strength25%Graph edge properties(implicit)
Content overlap25%Token intersection(implicit)
Temporal alignment15%Time expression matchingSEARCH_WEIGHT_RECENCY
Tag matching10%Prefix/exact tag filtersSEARCH_WEIGHT_TAG
Importance score5%User-assigned prioritySEARCH_WEIGHT_IMPORTANCE
Confidence score5%Classification confidenceSEARCH_WEIGHT_CONFIDENCE
Recency boost10%Freshness decay functionSEARCH_WEIGHT_RECENCY

AutoMem runs four independent background threads that process memories asynchronously without blocking API requests.

All workers share the ServiceState instance and use thread-safe coordination:

WorkerQueueLockTracking SetsStop Signal
Enrichmentenrichment_queueenrichment_lockenrichment_inflight, enrichment_pendingNone (daemon)
Embeddingembedding_queueembedding_lockembedding_inflight, embedding_pendingNone (daemon)
ConsolidationN/A (schedule-based)N/AN/Aconsolidation_stop_event
SyncN/A (interval-based)N/AN/Async_stop_event

AutoMem uses two abstraction modules to isolate database-specific logic:

The automem/stores/graph_store.py module encapsulates FalkorDB operations:

Key functions:

  • _build_graph_tag_predicate(tag_mode, tag_match) — Generates Cypher WHERE clauses for tag filtering
  • _serialize_node(node) — Converts FalkorDB node to dictionary with property extraction
  • _summarize_relation_node(rel) — Extracts relationship metadata (type, strength, context)

The automem/stores/vector_store.py module handles Qdrant operations:

Key functions:

  • _build_qdrant_tag_filter(tags, mode, match) — Constructs Qdrant filter objects for tag queries
  • _ensure_collection_exists() — Creates collection with appropriate vector parameters if missing
  • Graceful degradation: All Qdrant operations wrapped in try-except with logging but no request failures

AutoMem implements two-tier authorization with multiple token extraction methods.

The _extract_api_token() function (app.py:1127-1144) tries three methods in order:

  1. Bearer token (recommended): Authorization: Bearer {token}
  2. Custom header: X-API-Key: {token}
  3. Query parameter (discouraged): ?api_key={token}

Admin operations require a separate token checked by _require_admin_token() (app.py:1150-1162):

EndpointPurposeAdmin Token Required
POST /admin/reembedRegenerate all embeddingsYes
POST /enrichment/reprocessForce re-enrichmentYes
GET /consolidate/statusView scheduler stateNo
POST /consolidateTrigger consolidationNo

Configuration:


The embedding system uses a provider pattern that enables automatic fallback and explicit selection.

All providers implement the EmbeddingProvider abstract base class.

When EMBEDDING_PROVIDER=auto (default), the system tries providers in order:

  1. Voyage AI — Best quality, requires API key (automem/embedding/voyage.py)
  2. OpenAI — High quality, requires API key (automem/embedding/openai.py)
  3. Ollama — Local server, no API key (automem/embedding/ollama.py)
  4. FastEmbed — Local ONNX, no API key (automem/embedding/fastembed.py)
  5. Placeholder — Hash-based, always available (automem/embedding/placeholder.py)

The validate_vector_dimensions() function (automem/utils/validation.py) prevents dimension mismatches that would corrupt Qdrant data.


Configuration is centralized in automem/config.py which loads environment variables and provides constants throughout the codebase.

CategoryKey VariablesPurpose
DatabaseFALKORDB_HOST, FALKORDB_PORT, GRAPH_NAMEGraph database connection
Vector StoreQDRANT_URL, QDRANT_API_KEY, COLLECTION_NAME, VECTOR_SIZEOptional semantic search
AuthenticationAUTOMEM_API_TOKEN, ADMIN_API_TOKENAPI access control
EmbeddingEMBEDDING_PROVIDER, EMBEDDING_MODEL, VOYAGE_API_KEY, OPENAI_API_KEYEmbedding generation
EnrichmentENRICHMENT_* (12 variables)Entity extraction, linking
ConsolidationCONSOLIDATION_* (15 variables)Schedule, thresholds
SearchSEARCH_WEIGHT_* (8 variables)Hybrid scoring weights
MemoryMEMORY_TYPES, RELATIONSHIP_TYPES, TYPE_ALIASESType system

The automem/config.py module loads configuration in this order:

  1. Environment variables — Highest priority
  2. ~/.config/automem/.env — User-level defaults
  3. Project .env — Development defaults
  4. Hardcoded defaults — Fallback values in config.py

The type system supports classification and normalization across defined memory types.

The relationship type system defines 11 typed edges with validation.


AutoMem implements defense-in-depth error handling to maximize availability.

Design principle: FalkorDB writes commit before Qdrant operations. Vector store failures are logged but do not return errors to clients.

Each worker implements retry logic with exponential backoff:

WorkerMax RetriesBackoffFailure Handling
Enrichment35s, 10s, 15sLog error, update stats, discard job
EmbeddingBuilt into providerProvider-specificQueue retries, mark failed after max attempts
ConsolidationInfinite (scheduled)NoneLog error, continue to next cycle
SyncInfinite (interval)NoneLog error, retry on next interval

The Flask application uses structured error responses.

HTTP status codes:

  • 400 — Client error (invalid request)
  • 401 — Unauthorized (missing/invalid token)
  • 403 — Forbidden (admin token required)
  • 404 — Not found (memory doesn’t exist)
  • 500 — Server error (unexpected failure)
  • 503 — Service unavailable (database down)

The embedding worker batches requests to reduce API costs and latency.

The seen_ids set prevents duplicate results when combining vector and keyword search.

The spaCy NLP model is loaded once and cached.


AutoMem’s architecture implements three core principles:

  1. Dual-storage canonical design — FalkorDB as authoritative record, Qdrant as optional enhancement
  2. Asynchronous enrichment — Non-blocking entity extraction, embedding generation, and relationship creation
  3. Provider pattern abstractions — Pluggable embedding providers with automatic fallback

The system coordinates four independent background workers (enrichment, embedding, consolidation, sync) through a shared ServiceState dataclass, using thread-safe queues and lock-protected tracking sets.

For detailed information about specific components, see: