Testing
AutoMem Server — Testing (pytest)
Section titled “AutoMem Server — Testing (pytest)”Purpose and Scope
Section titled “Purpose and Scope”AutoMem’s test suite validates core functionality through unit tests with emphasis on the consolidation engine. The test framework uses pytest with mock objects to isolate graph and vector store interactions, enabling fast execution without external dependencies.
Currently, the test suite focuses on consolidation logic. API endpoints and background workers are validated through manual testing and production monitoring.
Test Suite Organization
Section titled “Test Suite Organization”tests/└── test_consolidation_engine.py # Consolidation engine unit testsCoverage by Component
Section titled “Coverage by Component”| Component | Test File | Coverage Status |
|---|---|---|
| Memory Consolidation | test_consolidation_engine.py | Comprehensive |
| Decay Calculations | test_consolidation_engine.py | Covered |
| Creative Associations | test_consolidation_engine.py | Covered |
| Clustering Logic | test_consolidation_engine.py | Covered |
| Controlled Forgetting | test_consolidation_engine.py | Covered |
| Flask API Endpoints | — | Manual testing only |
| Enrichment Pipeline | — | Manual testing only |
| Embedding Worker | — | Manual testing only |
| MCP Bridge | — | Manual testing only |
Running Tests
Section titled “Running Tests”Prerequisites — install development dependencies:
pip install -r requirements-dev.txtBasic execution:
pytest tests/ # Run all testspytest tests/ -v # Verbose outputpytest tests/test_consolidation_engine.py # Specific filepytest tests/ -k "test_relevance_score" # Specific functionTest Architecture
Section titled “Test Architecture”graph TB
subgraph "Test Layer"
TestFunctions["Test Functions\ntest_calculate_relevance_score_*\ntest_discover_creative_*\ntest_cluster_similar_*\ntest_apply_controlled_*"]
Fixtures["pytest Fixtures\nfreeze_time\nmonkeypatch"]
end
subgraph "Mock Objects"
FakeGraph["FakeGraph\nimplements GraphLike protocol"]
FakeVectorStore["FakeVectorStore\nimplements VectorStoreProtocol"]
FakeResult["FakeResult\nmimics FalkorDB result"]
end
subgraph "Code Under Test"
MemoryConsolidator["MemoryConsolidator\nconsolidation.py:111-770"]
calculate_relevance["calculate_relevance_score()\nconsolidation.py:178-230"]
discover_creative["discover_creative_associations()\nconsolidation.py:232-330"]
cluster_similar["cluster_similar_memories()\nconsolidation.py:332-431"]
apply_forgetting["apply_controlled_forgetting()\nconsolidation.py:433-545"]
end
TestFunctions --> Fixtures
TestFunctions --> FakeGraph
TestFunctions --> FakeVectorStore
FakeGraph --> MemoryConsolidator
FakeVectorStore --> MemoryConsolidator
MemoryConsolidator --> calculate_relevance
MemoryConsolidator --> discover_creative
MemoryConsolidator --> cluster_similar
MemoryConsolidator --> apply_forgetting
FakeGraph --> FakeResult
Mock Object Implementation
Section titled “Mock Object Implementation”The test suite uses in-memory mock objects that implement the same protocols as production dependencies, enabling fast tests without FalkorDB or Qdrant instances.
FakeGraph Class
Section titled “FakeGraph Class”FakeGraph implements the GraphLike protocol and simulates FalkorDB query behavior (tests/test_consolidation_engine.py:17-69):
- Query pattern matching: Uses string matching to identify query type (e.g.,
"COUNT(DISTINCT r)"for relationship counts) - Deterministic responses: Returns pre-configured data from state attributes
- Side effect tracking: Records deletions, archives, and score updates
- Query history: Stores all queries for verification
FakeVectorStore Class
Section titled “FakeVectorStore Class”FakeVectorStore implements VectorStoreProtocol and tracks vector deletion operations (tests/test_consolidation_engine.py:72-77).
FakeResult Class
Section titled “FakeResult Class”FakeResult mimics FalkorDB query result structure with a result_set attribute (tests/test_consolidation_engine.py:12-14).
Test Fixtures
Section titled “Test Fixtures”freeze_time Fixture
Section titled “freeze_time Fixture”The freeze_time fixture uses monkeypatch to replace datetime.now() with a fixed timestamp (2024-01-01 00:00:00 UTC), ensuring deterministic decay calculations (tests/test_consolidation_engine.py:80-92):
- Auto-use: Applies to all tests automatically
- Module patching: Patches
consolidation_module.datetime, not globaldatetime - Timezone support: Handles both naive and aware datetime calls
- Cleanup: Automatically restores original
datetimeafter each test
Helper Functions
Section titled “Helper Functions”iso_days_ago(days: int) generates ISO timestamp strings relative to the frozen time:
def iso_days_ago(days: int) -> str: return (FROZEN_TIME - timedelta(days=days)).isoformat()Writing New Tests
Section titled “Writing New Tests”Test Function Structure
Section titled “Test Function Structure”Follow this pattern for consolidation engine tests:
- Setup: Create
FakeGraphand configure mock data - Execution: Instantiate
MemoryConsolidatorand call method - Verification: Assert return values and check mock state
def test_calculate_relevance_score_recent_memory(): # 1. Setup graph = FakeGraph() graph.relationship_counts = {"mem_001": 3}
# 2. Execution consolidator = MemoryConsolidator(graph=graph, vector_store=FakeVectorStore()) score = consolidator.calculate_relevance_score( memory_id="mem_001", importance=0.8, access_count=5, last_accessed=iso_days_ago(1), created_at=iso_days_ago(2), )
# 3. Verification assert 0.5 < score < 1.0 # Recent, high-importance memoryMock Data Configuration
Section titled “Mock Data Configuration”Relationship counts — control _get_relationship_count() return values:
graph.relationship_counts = { "mem_001": 5, # 5 relationships "mem_002": 0 # No relationships}Sample memories for creative associations — configure sample_rows with structure matching the query in discover_creative_associations():
graph.sample_rows = [ ["mem_001", "Chose PostgreSQL for reliability", 0.8, 2], ["mem_002", "Prefer typed languages", 0.7, 1],]Cluster data — configure cluster_rows for clustering tests:
graph.cluster_rows = [ ["mem_001", "content_a", [0.1, 0.2, 0.3]], ["mem_002", "content_b", [0.15, 0.22, 0.31]],]Decay and forgetting data — configure decay_rows or forgetting_rows with full memory attributes:
graph.decay_rows = [ { "id": "mem_001", "importance": 0.8, "access_count": 10, "last_accessed": iso_days_ago(30), "created_at": iso_days_ago(60), "relevance_score": 0.75 }]Testing Dry Run vs Execution
Section titled “Testing Dry Run vs Execution”Many consolidation methods support a dry_run parameter. Test both modes:
def test_apply_forgetting_dry_run(): graph = FakeGraph() # ... configure graph ... consolidator = MemoryConsolidator(graph=graph, vector_store=FakeVectorStore()) result = consolidator.apply_controlled_forgetting(dry_run=True) assert len(graph.deleted_nodes) == 0 # Nothing deleted in dry run
def test_apply_forgetting_execution(): graph = FakeGraph() # ... configure graph with low-relevance memories ... vector_store = FakeVectorStore() consolidator = MemoryConsolidator(graph=graph, vector_store=vector_store) result = consolidator.apply_controlled_forgetting(dry_run=False) assert len(graph.deleted_nodes) > 0 # Nodes deleted assert len(vector_store.deleted_ids) > 0 # Vectors deletedVerifying Mock Interactions
Section titled “Verifying Mock Interactions”Query history — check which queries were executed:
assert any("MATCH (m:Memory)" in q for q in graph.query_history)State changes — verify deletions, archives, and score updates:
assert "mem_001" in graph.deleted_nodesassert "mem_002" in graph.archived_nodesassert graph.updated_scores.get("mem_003") < 0.2Vector store interactions:
assert "mem_001" in vector_store.deleted_idsTest Coverage Report
Section titled “Test Coverage Report”Consolidation Engine Coverage
Section titled “Consolidation Engine Coverage”| Module | Function | Test Coverage |
|---|---|---|
MemoryConsolidator | calculate_relevance_score() | Complete |
MemoryConsolidator | discover_creative_associations() | Complete |
MemoryConsolidator | cluster_similar_memories() | Complete |
MemoryConsolidator | apply_controlled_forgetting() | Complete (dry run + execution) |
MemoryConsolidator | _apply_decay() | Complete |
MemoryConsolidator | consolidate() | Partial (individual steps tested) |
ConsolidationScheduler | — | Not tested |
Uncovered Components
Section titled “Uncovered Components”Flask API Service — requires a running Flask app; use curl or Postman for manual verification:
/memory(POST) — Memory storage/recall(GET) — Hybrid search/associate(POST) — Relationship creation/health(GET) — Health check endpoint
Background Workers — validated by observing log output:
enrichment_worker()— Entity extraction and taggingembedding_worker()— Batch embedding generation
MCP Bridge — deploy to Railway and test with Claude Desktop.
Performance Testing Procedures
Section titled “Performance Testing Procedures”From docs/OPTIMIZATIONS.md:
Embedding batching verification — rapidly create 25 memories and check logs:
for i in {1..25}; do curl -X POST http://localhost:8001/memory \ -H "Authorization: Bearer $TOKEN" \ -d "{\"content\": \"Memory $i\", \"tags\": [\"test\"]}"doneExpected log: Generated 20 OpenAI embeddings in batch
Consolidation performance — monitor logs during decay runs. After optimization, should see ~80% reduction in relationship query counts.
Health endpoint — verify enrichment stats:
curl http://localhost:8001/health | jq '.enrichment'# Expected keys: status, queue_depth, pending, inflight, processed, failedStructured logging — check logs for structured events after memory operations:
recall_completeevents with latency metricsmemory_storedevents with queue status
Future Testing Priorities
Section titled “Future Testing Priorities”Based on current coverage gaps:
- API endpoint tests — Test Flask routes with mock database
- Background worker tests — Test enrichment and embedding workers
- Integration tests — Test full memory storage → enrichment → recall flow
- ConsolidationScheduler tests — Test scheduling logic and intervals
- MCP bridge tests — Test tool call translation
- Load testing — Verify performance under high memory throughput
mcp-automem Client — Testing (Vitest)
Section titled “mcp-automem Client — Testing (Vitest)”Overview
Section titled “Overview”The mcp-automem package uses Vitest as its test runner, providing fast ES module support, built-in coverage reporting, watch mode for TDD, and parallel test execution.
Test Organization
Section titled “Test Organization”graph TB
TESTS["Test Suite"]
TESTS --> UNIT["Unit Tests\n(vitest.config.ts)"]
TESTS --> INTEGRATION["Integration Tests\n(vitest.integration.config.ts)"]
UNIT --> CLIENT_TEST["AutoMemClient\nHTTP mocking"]
UNIT --> CLI_TEST["CLI Handlers\nFilesystem mocking"]
UNIT --> TOOLS_TEST["MCP Tools\nRequest/response validation"]
INTEGRATION --> ENDTOEND["End-to-End\nReal AutoMem service"]
INTEGRATION --> PLATFORM["Platform Integration\nInstaller verification"]
- Unit tests:
src/**/*.test.tsortests/unit/— mocknode-fetchfor HTTP requests, mockfsfor filesystem operations, use in-memory fixtures for AutoMem responses - Integration tests:
tests/integration/— require a running AutoMem service
Test Commands
Section titled “Test Commands”| Script | Command | Purpose |
|---|---|---|
test | vitest run | Run all unit tests once |
test:watch | vitest | Watch mode for TDD |
test:coverage | vitest run --coverage | Generate coverage reports |
test:integration | vitest run --config vitest.integration.config.ts | Integration test suite |
test:all | Both unit and integration | Full test suite |
Coverage Requirements
Section titled “Coverage Requirements”Coverage reporting is integrated into the CI pipeline via @vitest/coverage-v8, generating reports for line, branch, function, and statement coverage. Coverage failures don’t block builds (continue-on-error: true) — coverage is informational rather than a hard gate during active development.
Mock Strategy
Section titled “Mock Strategy”HTTP mocking — mock node-fetch to simulate AutoMem backend responses without a running service:
import { vi } from 'vitest';
vi.mock('node-fetch', () => ({ default: vi.fn().mockResolvedValue({ ok: true, json: () => Promise.resolve({ memory_id: 'test-123', message: 'Stored' }) })}));Filesystem mocking — mock fs for CLI installer tests:
vi.mock('fs', () => ({ writeFileSync: vi.fn(), readFileSync: vi.fn().mockReturnValue('template content'), existsSync: vi.fn().mockReturnValue(false)}));In-memory AutoMem fixtures — use realistic response shapes:
const mockRecallResponse = { results: [ { id: 'mem-001', content: 'Chose PostgreSQL for reliability', score: 0.92, tags: ['decision', 'database'], importance: 0.8, metadata: { type: 'Decision' } } ], count: 1};Integration Test Requirements
Section titled “Integration Test Requirements”Integration tests require a live AutoMem service. Set the following environment variables before running:
AUTOMEM_ENDPOINT=http://localhost:8001AUTOMEM_API_KEY=your-test-tokenCI/CD Integration
Section titled “CI/CD Integration”The CI workflow (.github/workflows/ci.yml) runs on every PR and push to main:
- Link validation on documentation files
npm run lint— ESLint (fails build on errors)npm run build— TypeScript compilation (fails build on errors)npm test— Unit tests (fails build on failures)npm run test:coverage— Coverage report (non-blocking)
Unit tests run in CI without an external AutoMem service. Integration tests are run manually or in dedicated integration environments.