Testing

AutoMem Server — Testing (pytest)

Purpose and Scope

AutoMem’s test suite validates core functionality through unit tests with emphasis on the consolidation engine. The test framework uses pytest with mock objects to isolate graph and vector store interactions, enabling fast execution without external dependencies.

Currently, the test suite focuses on consolidation logic. API endpoints and background workers are validated through manual testing and production monitoring.

Test Suite Organization

tests/
└── test_consolidation_engine.py   # Consolidation engine unit tests

Coverage by Component

Component	Test File	Coverage Status
Memory Consolidation	`test_consolidation_engine.py`	Comprehensive
Decay Calculations	`test_consolidation_engine.py`	Covered
Creative Associations	`test_consolidation_engine.py`	Covered
Clustering Logic	`test_consolidation_engine.py`	Covered
Controlled Forgetting	`test_consolidation_engine.py`	Covered
Flask API Endpoints	—	Manual testing only
Enrichment Pipeline	—	Manual testing only
Embedding Worker	—	Manual testing only
MCP Bridge	—	Manual testing only

Running Tests

Prerequisites — install development dependencies:

pip install -r requirements-dev.txt

Basic execution:

pytest tests/                                    # Run all tests
pytest tests/ -v                                 # Verbose output
pytest tests/test_consolidation_engine.py        # Specific file
pytest tests/ -k "test_relevance_score"          # Specific function

Test Architecture

graph TB
    subgraph "Test Layer"
        TestFunctions["Test Functions\ntest_calculate_relevance_score_*\ntest_discover_creative_*\ntest_cluster_similar_*\ntest_apply_controlled_*"]
        Fixtures["pytest Fixtures\nfreeze_time\nmonkeypatch"]
    end

    subgraph "Mock Objects"
        FakeGraph["FakeGraph\nimplements GraphLike protocol"]
        FakeVectorStore["FakeVectorStore\nimplements VectorStoreProtocol"]
        FakeResult["FakeResult\nmimics FalkorDB result"]
    end

    subgraph "Code Under Test"
        MemoryConsolidator["MemoryConsolidator\nconsolidation.py:111-770"]
        calculate_relevance["calculate_relevance_score()\nconsolidation.py:178-230"]
        discover_creative["discover_creative_associations()\nconsolidation.py:232-330"]
        cluster_similar["cluster_similar_memories()\nconsolidation.py:332-431"]
        apply_forgetting["apply_controlled_forgetting()\nconsolidation.py:433-545"]
    end

    TestFunctions --> Fixtures
    TestFunctions --> FakeGraph
    TestFunctions --> FakeVectorStore

    FakeGraph --> MemoryConsolidator
    FakeVectorStore --> MemoryConsolidator

    MemoryConsolidator --> calculate_relevance
    MemoryConsolidator --> discover_creative
    MemoryConsolidator --> cluster_similar
    MemoryConsolidator --> apply_forgetting

    FakeGraph --> FakeResult

Mock Object Implementation

The test suite uses in-memory mock objects that implement the same protocols as production dependencies, enabling fast tests without FalkorDB or Qdrant instances.

`FakeGraph` Class

FakeGraph implements the GraphLike protocol and simulates FalkorDB query behavior (tests/test_consolidation_engine.py:17-69):

Query pattern matching: Uses string matching to identify query type (e.g., "COUNT(DISTINCT r)" for relationship counts)
Deterministic responses: Returns pre-configured data from state attributes
Side effect tracking: Records deletions, archives, and score updates
Query history: Stores all queries for verification

`FakeVectorStore` Class

FakeVectorStore implements VectorStoreProtocol and tracks vector deletion operations (tests/test_consolidation_engine.py:72-77).

`FakeResult` Class

FakeResult mimics FalkorDB query result structure with a result_set attribute (tests/test_consolidation_engine.py:12-14).

Test Fixtures

`freeze_time` Fixture

The freeze_time fixture uses monkeypatch to replace datetime.now() with a fixed timestamp (2024-01-01 00:00:00 UTC), ensuring deterministic decay calculations (tests/test_consolidation_engine.py:80-92):

Auto-use: Applies to all tests automatically
Module patching: Patches consolidation_module.datetime, not global datetime
Timezone support: Handles both naive and aware datetime calls
Cleanup: Automatically restores original datetime after each test

Helper Functions

iso_days_ago(days: int) generates ISO timestamp strings relative to the frozen time:

def iso_days_ago(days: int) -> str:
    return (FROZEN_TIME - timedelta(days=days)).isoformat()

Writing New Tests

Test Function Structure

Follow this pattern for consolidation engine tests:

Setup: Create FakeGraph and configure mock data
Execution: Instantiate MemoryConsolidator and call method
Verification: Assert return values and check mock state

def test_calculate_relevance_score_recent_memory():
    # 1. Setup
    graph = FakeGraph()
    graph.relationship_counts = {"mem_001": 3}

    # 2. Execution
    consolidator = MemoryConsolidator(graph=graph, vector_store=FakeVectorStore())
    score = consolidator.calculate_relevance_score(
        memory_id="mem_001",
        importance=0.8,
        access_count=5,
        last_accessed=iso_days_ago(1),
        created_at=iso_days_ago(2),
    )

    # 3. Verification
    assert 0.5 < score < 1.0  # Recent, high-importance memory

Mock Data Configuration

Relationship counts — control _get_relationship_count() return values:

graph.relationship_counts = {
    "mem_001": 5,  # 5 relationships
    "mem_002": 0   # No relationships
}

Sample memories for creative associations — configure sample_rows with structure matching the query in discover_creative_associations():

graph.sample_rows = [
    ["mem_001", "Chose PostgreSQL for reliability", 0.8, 2],
    ["mem_002", "Prefer typed languages", 0.7, 1],
]

Cluster data — configure cluster_rows for clustering tests:

graph.cluster_rows = [
    ["mem_001", "content_a", [0.1, 0.2, 0.3]],
    ["mem_002", "content_b", [0.15, 0.22, 0.31]],
]

Decay and forgetting data — configure decay_rows or forgetting_rows with full memory attributes:

graph.decay_rows = [
    {
        "id": "mem_001",
        "importance": 0.8,
        "access_count": 10,
        "last_accessed": iso_days_ago(30),
        "created_at": iso_days_ago(60),
        "relevance_score": 0.75
    }
]

Testing Dry Run vs Execution

Many consolidation methods support a dry_run parameter. Test both modes:

def test_apply_forgetting_dry_run():
    graph = FakeGraph()
    # ... configure graph ...
    consolidator = MemoryConsolidator(graph=graph, vector_store=FakeVectorStore())
    result = consolidator.apply_controlled_forgetting(dry_run=True)
    assert len(graph.deleted_nodes) == 0  # Nothing deleted in dry run

def test_apply_forgetting_execution():
    graph = FakeGraph()
    # ... configure graph with low-relevance memories ...
    vector_store = FakeVectorStore()
    consolidator = MemoryConsolidator(graph=graph, vector_store=vector_store)
    result = consolidator.apply_controlled_forgetting(dry_run=False)
    assert len(graph.deleted_nodes) > 0   # Nodes deleted
    assert len(vector_store.deleted_ids) > 0  # Vectors deleted

Verifying Mock Interactions

Query history — check which queries were executed:

assert any("MATCH (m:Memory)" in q for q in graph.query_history)

State changes — verify deletions, archives, and score updates:

assert "mem_001" in graph.deleted_nodes
assert "mem_002" in graph.archived_nodes
assert graph.updated_scores.get("mem_003") < 0.2

Vector store interactions:

assert "mem_001" in vector_store.deleted_ids

Test Coverage Report

Consolidation Engine Coverage

Module	Function	Test Coverage
`MemoryConsolidator`	`calculate_relevance_score()`	Complete
`MemoryConsolidator`	`discover_creative_associations()`	Complete
`MemoryConsolidator`	`cluster_similar_memories()`	Complete
`MemoryConsolidator`	`apply_controlled_forgetting()`	Complete (dry run + execution)
`MemoryConsolidator`	`_apply_decay()`	Complete
`MemoryConsolidator`	`consolidate()`	Partial (individual steps tested)
`ConsolidationScheduler`	—	Not tested

Uncovered Components

Flask API Service — requires a running Flask app; use curl or Postman for manual verification:

/memory (POST) — Memory storage
/recall (GET) — Hybrid search
/associate (POST) — Relationship creation
/health (GET) — Health check endpoint

Background Workers — validated by observing log output:

enrichment_worker() — Entity extraction and tagging
embedding_worker() — Batch embedding generation

MCP Bridge — deploy to Railway and test with Claude Desktop.

Performance Testing Procedures

From docs/OPTIMIZATIONS.md:

Embedding batching verification — rapidly create 25 memories and check logs:

for i in {1..25}; do
  curl -X POST http://localhost:8001/memory \
    -H "Authorization: Bearer $TOKEN" \
    -d "{\"content\": \"Memory $i\", \"tags\": [\"test\"]}"
done

Expected log: Generated 20 OpenAI embeddings in batch

Consolidation performance — monitor logs during decay runs. After optimization, should see ~80% reduction in relationship query counts.

Health endpoint — verify enrichment stats:

curl http://localhost:8001/health | jq '.enrichment'
# Expected keys: status, queue_depth, pending, inflight, processed, failed

Structured logging — check logs for structured events after memory operations:

recall_complete events with latency metrics
memory_stored events with queue status

Future Testing Priorities

Based on current coverage gaps:

API endpoint tests — Test Flask routes with mock database
Background worker tests — Test enrichment and embedding workers
Integration tests — Test full memory storage → enrichment → recall flow
ConsolidationScheduler tests — Test scheduling logic and intervals
MCP bridge tests — Test tool call translation
Load testing — Verify performance under high memory throughput

mcp-automem Client — Testing (Vitest)

Overview

The mcp-automem package uses Vitest as its test runner, providing fast ES module support, built-in coverage reporting, watch mode for TDD, and parallel test execution.

Test Organization

graph TB
    TESTS["Test Suite"]

    TESTS --> UNIT["Unit Tests\n(vitest.config.ts)"]
    TESTS --> INTEGRATION["Integration Tests\n(vitest.integration.config.ts)"]

    UNIT --> CLIENT_TEST["AutoMemClient\nHTTP mocking"]
    UNIT --> CLI_TEST["CLI Handlers\nFilesystem mocking"]
    UNIT --> TOOLS_TEST["MCP Tools\nRequest/response validation"]

    INTEGRATION --> ENDTOEND["End-to-End\nReal AutoMem service"]
    INTEGRATION --> PLATFORM["Platform Integration\nInstaller verification"]

Unit tests: src/**/*.test.ts or tests/unit/ — mock node-fetch for HTTP requests, mock fs for filesystem operations, use in-memory fixtures for AutoMem responses
Integration tests: tests/integration/ — require a running AutoMem service

Test Commands

Script	Command	Purpose
`test`	`vitest run`	Run all unit tests once
`test:watch`	`vitest`	Watch mode for TDD
`test:coverage`	`vitest run --coverage`	Generate coverage reports
`test:integration`	`vitest run --config vitest.integration.config.ts`	Integration test suite
`test:all`	Both unit and integration	Full test suite

Coverage Requirements

Coverage reporting is integrated into the CI pipeline via @vitest/coverage-v8, generating reports for line, branch, function, and statement coverage. Coverage failures don’t block builds (continue-on-error: true) — coverage is informational rather than a hard gate during active development.

Mock Strategy

HTTP mocking — mock node-fetch to simulate AutoMem backend responses without a running service:

import { vi } from 'vitest';

vi.mock('node-fetch', () => ({
  default: vi.fn().mockResolvedValue({
    ok: true,
    json: () => Promise.resolve({ memory_id: 'test-123', message: 'Stored' })
  })
}));

Filesystem mocking — mock fs for CLI installer tests:

vi.mock('fs', () => ({
  writeFileSync: vi.fn(),
  readFileSync: vi.fn().mockReturnValue('template content'),
  existsSync: vi.fn().mockReturnValue(false)
}));

In-memory AutoMem fixtures — use realistic response shapes:

const mockRecallResponse = {
  results: [
    {
      id: 'mem-001',
      content: 'Chose PostgreSQL for reliability',
      score: 0.92,
      tags: ['decision', 'database'],
      importance: 0.8,
      metadata: { type: 'Decision' }
    }
  ],
  count: 1
};

Integration Test Requirements

Integration tests require a live AutoMem service. Set the following environment variables before running:

AUTOMEM_ENDPOINT=http://localhost:8001
AUTOMEM_API_KEY=your-test-token

CI/CD Integration

The CI workflow (.github/workflows/ci.yml) runs on every PR and push to main:

Link validation on documentation files
npm run lint — ESLint (fails build on errors)
npm run build — TypeScript compilation (fails build on errors)
npm test — Unit tests (fails build on failures)
npm run test:coverage — Coverage report (non-blocking)

Unit tests run in CI without an external AutoMem service. Integration tests are run manually or in dedicated integration environments.