Introduction

AutoMem gives AI assistants persistent memory that survives across sessions, devices, and platforms. When you tell Claude Desktop something important, AutoMem stores it. When you open Cursor the next day, that memory is still there — retrieved automatically and woven into context before your AI assistant even responds.

What AutoMem Is

AutoMem is a self-hosted memory backend for AI assistants. It stores, enriches, and retrieves memories using a combination of graph and vector databases, enabling AI tools to remember facts, decisions, preferences, and patterns across any number of sessions or platforms.

Unlike context windows that reset with every conversation, AutoMem maintains a persistent knowledge graph that grows over time. Memories are enriched in the background with entity extraction, embeddings, and relationship mapping — then recalled via hybrid search that combines semantic similarity with graph traversal.

What gets remembered:

Memory Type	Description	Example
`Decision`	Strategic or technical decisions	”Chose PostgreSQL for ACID compliance”
`Pattern`	Recurring approaches	”Always use transactions for batch updates”
`Preference`	User or team preferences	”Prefer async/await over callbacks”
`Style`	Code style conventions	”Use 4-space indentation for Python”
`Habit`	Regular workflows	”Run tests before committing”
`Insight`	Key learnings	”Connection pooling reduces latency by 40%“
`Context`	General information	”Project uses Python 3.10+”

Two-Component Architecture

AutoMem consists of two separate components that work together:

The AutoMem server (verygoodplugins/automem) — a Python/Flask service that handles memory storage, retrieval, and enrichment. It runs your databases and exposes a REST API.
The MCP client (verygoodplugins/mcp-automem) — a TypeScript/Node.js package that translates Model Context Protocol calls from AI platforms into AutoMem HTTP API requests.

The MCP client does not store memories itself. It is a bridge — it receives instructions from AI platforms (via the MCP protocol over stdio) and forwards them to the AutoMem server over HTTP.

graph TB
    subgraph clients["External Clients"]
        AITools["AI Tools<br/>Claude Desktop, Cursor"]
        CloudAI["Cloud AI Platforms<br/>ChatGPT, Claude.ai"]
        DirectAPI["Direct API Clients<br/>HTTP/REST"]
    end

    subgraph core["AutoMem Core Services"]
        direction TB

        subgraph bridge["MCP Bridge Layer"]
            MCPBridge["mcp-sse-server<br/>Node.js :8080<br/>Protocol Translation"]
        end

        subgraph app_layer["Application Layer"]
            FlaskAPI["Flask API<br/>app.py<br/>Gunicorn :8001"]

            subgraph workers["Background Workers"]
                EnrichWorker["EnrichmentWorker<br/>Entity extraction<br/>Pattern detection"]
                EmbedWorker["EmbeddingWorker<br/>Batch generation"]
                ConsolWorker["ConsolidationScheduler<br/>Decay/Creative/Cluster/Forget"]
                SyncWorker["SyncWorker<br/>Drift repair"]
            end
        end

        subgraph storage["Storage Layer"]
            FalkorDB[("FalkorDB<br/>Graph Database :6379<br/>Canonical Memory Records")]
            Qdrant[("Qdrant<br/>Vector Database :6333<br/>Semantic Search<br/>Optional")]
        end
    end

    AITools -->|"MCP Protocol"| FlaskAPI
    CloudAI -->|"SSE/HTTP"| MCPBridge
    DirectAPI -->|"HTTP REST"| FlaskAPI

    MCPBridge -->|"Internal HTTP"| FlaskAPI

    FlaskAPI -->|"Graph Queries"| FalkorDB
    FlaskAPI -.->|"Vector Search"| Qdrant

    EnrichWorker --> FalkorDB
    EnrichWorker -.-> Qdrant
    EmbedWorker --> Qdrant
    ConsolWorker --> FalkorDB
    ConsolWorker -.-> Qdrant
    SyncWorker --> FalkorDB
    SyncWorker --> Qdrant

    style FalkorDB fill:#f9f9f9,stroke:#333
    style Qdrant fill:#f9f9f9,stroke:#333,stroke-dasharray: 5 5

The AutoMem Server

The server (app.py, Flask/Gunicorn on port 8001) is the authoritative memory store. It provides:

A REST API for storing, recalling, updating, and deleting memories
A FalkorDB graph database for canonical memory records and relationship traversal
An optional Qdrant vector database for semantic search (the system degrades gracefully without it)
Background worker threads for enrichment (entity extraction, embedding generation) and consolidation (decay, creative association, clustering, forgetting)

The MCP Client

The MCP client (@verygoodplugins/mcp-automem) operates in two modes depending on how it is invoked:

Server mode: When AI platforms launch it with no arguments (npx -y @verygoodplugins/mcp-automem), it starts a StdioServerTransport and listens on stdin/stdout for JSON-RPC messages from the platform.
CLI mode: When invoked with arguments (npx @verygoodplugins/mcp-automem setup), it routes to command handlers for setup, configuration, and platform installation.

Both modes use the same AutoMemClient class (src/automem-client.ts) for HTTP communication with the AutoMem service.

graph TB
    subgraph "AI Platform Layer"
        CD["Claude Desktop"]
        CUR["Cursor IDE"]
        CC["Claude Code"]
        COD["OpenAI Codex"]
        WARP["Warp Terminal"]
        OC["OpenClaw"]
    end

    subgraph "THIS PACKAGE: @verygoodplugins/mcp-automem"
        SERVER["MCP Server<br/>src/index.ts<br/>StdioServerTransport"]
        CLI["CLI Commands<br/>src/cli/*<br/>setup, config, platform installers"]
        CLIENT["AutoMemClient<br/>src/automem-client.ts<br/>HTTP wrapper"]

        SERVER --> CLIENT
        CLI --> CLIENT
    end

    subgraph "AutoMem Service (External)"
        API["AutoMem HTTP API<br/>:8001<br/>github.com/verygoodplugins/automem"]
        FALKOR["FalkorDB<br/>Graph database"]
        QDRANT["Qdrant<br/>Vector database"]

        API --> FALKOR
        API --> QDRANT
    end

    CD -->|"MCP Protocol<br/>stdio"| SERVER
    CUR -->|"MCP Protocol<br/>stdio"| SERVER
    CC -->|"MCP Protocol<br/>stdio"| SERVER
    COD -->|"MCP Protocol<br/>stdio"| SERVER
    WARP -->|"MCP Protocol<br/>stdio"| SERVER
    OC -->|"HTTP<br/>Direct API"| API

    CLIENT -->|"HTTP<br/>POST /memory<br/>GET /recall<br/>etc."| API

    style SERVER fill:#f9f9f9
    style CLIENT fill:#f9f9f9
    style CLI fill:#f9f9f9

How Memory Works End-to-End

When an AI assistant calls store_memory:

The MCP client validates content length (hard limit: 2000 characters; soft limit: 500 characters — above this, the backend may summarize before embedding)
The Flask API receives a POST /memory request
MemoryClassifier classifies the content type
EmbeddingProvider generates a vector representation
FalkorDB stores the canonical memory record via a MERGE query
Qdrant stores the vector for semantic search (if available)
Background enrichment is queued via ServiceState.enrichment_queue

When an AI assistant calls recall_memory, the system runs hybrid search: parallel queries against both the semantic vector store and the graph database, then merges and re-ranks results using a 9-component relevance score.

Why Two Repositories

The separation between server and client is intentional:

Deployment flexibility — run the memory backend on different infrastructure than the MCP client
Multi-client support — multiple MCP clients (on different machines or platforms) can share one AutoMem instance
Scaling independence — scale memory storage separately from MCP protocol handling
Technology separation — backend uses Python/Flask while the client uses TypeScript

Prerequisites

Server Prerequisites

Deployment Method	Requirements
Railway (Cloud)	Railway account (free trial available)
Docker Compose (Local)	Docker 20.10+, Docker Compose 2.0+
Bare Metal	Python 3.10+, external FalkorDB instance

Optional enhancements that improve functionality:

OpenAI API key — enables semantic embeddings instead of placeholder hash-based vectors
Voyage API key — alternative embedding provider with better performance
Qdrant Cloud account — enables vector search (AutoMem degrades gracefully without it)
spaCy model — richer entity extraction during enrichment (python -m spacy download en_core_web_sm)

MCP Client Prerequisites

Node.js 20.0.0 or higher — required for ESM support, the native fetch API, and modern async/await patterns
A running AutoMem service — the backend must be deployed before the MCP client can be configured

Check your Node.js version:

node --version
# Should output v20.0.0 or higher

If you need to upgrade:

Method	Command
nvm (recommended)	`nvm install 20`
Official installer	Download from nodejs.org
Homebrew (macOS)	`brew install node@20`
Package manager (Linux)	`apt install nodejs` or equivalent

Supported Platforms

The MCP client supports the following AI platforms via stdio:

Claude Desktop (macOS, Windows, Linux)
Cursor IDE
Claude Code
OpenAI Codex
Warp Terminal
OpenClaw

Cloud AI platforms (ChatGPT, Claude Web/Mobile, ElevenLabs) connect via a Remote MCP sidecar deployed alongside the AutoMem server on Railway.

Next Steps

Deploy and connect: See Quick Start to deploy the server and connect your first platform in 5 minutes.
Platform-specific setup: See the Platform Integrations section for Claude Desktop, Cursor, Claude Code, and others.
Understand the memory model: See Core Concepts for memory types, relationships, and hybrid search.
API reference: See the API Reference section for complete endpoint documentation.