Recall Operations
The recall system provides a single unified endpoint that supports multiple search strategies. It combines nine scoring components into a hybrid ranking system and supports both basic retrieval and advanced graph expansion. Authentication is required via Authorization: Bearer <token> header or X-API-Key header.
Endpoint Overview
Section titled “Endpoint Overview”GET /recallRequest Parameters
Section titled “Request Parameters”Basic Search Parameters
Section titled “Basic Search Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | — | Natural language search query or wildcard * |
embedding | array[float] | — | Pre-computed embedding vector for semantic search |
limit | integer | 10 | Maximum results to return (capped at RECALL_MAX_LIMIT=100) |
sort / order_by | string | score | Sort mode: score, time_desc, time_asc, updated_desc, updated_asc |
The query parameter triggers keyword extraction and full-text search. When query="*" or is omitted, the system returns trending memories ordered by importance.
Tag Filtering Parameters
Section titled “Tag Filtering Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
tags | array[string] | — | Tags to filter by (can specify multiple) |
exclude_tags | array[string] | — | Tags to exclude (supports prefix matching) |
tag_mode | string | any | Match mode: any or all |
tag_match | string | prefix | Match type: prefix or exact |
Tag filtering uses pre-computed tag_prefixes stored in both FalkorDB and Qdrant for efficient prefix matching. The exclude_tags parameter removes any memory containing ANY of the specified tags.
Prefix matching example:
- Query:
tags=slack - Matches:
slack:channel:general,slack:user:U123,slack:reaction:thumbs-up - Uses precomputed
tag_prefixesfield for O(1) lookup
Mode behavior:
any(default): Match if ANY filter tag presentall: Match only if ALL filter tags present
Temporal Filtering Parameters
Section titled “Temporal Filtering Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
time_query | string | — | Natural language time expression (e.g., "last week") |
start | string | — | ISO 8601 start timestamp |
end | string | — | ISO 8601 end timestamp |
Time queries are parsed by _parse_time_expression() from automem/utils/time.py, which supports expressions like "last 7 days", "yesterday", "last month". Explicit timestamps override natural language queries.
Context Hint Parameters
Section titled “Context Hint Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
context | string | — | Contextual hint for ranking (e.g., "programming task") |
language | string | — | Programming language context (e.g., "python") |
active_path | string | — | File path being edited (auto-detects language) |
context_tags | array[string] | — | Tags to boost in scoring |
context_types | array[string] | — | Memory types to prioritize |
priority_ids | array[string] | — | Specific memory IDs to guarantee inclusion |
Context hints influence the 9-component scoring system but do not filter results. When active_path matches a coding language extension, the system prioritizes Style type memories for that language.
Graph Expansion Parameters
Section titled “Graph Expansion Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
expand_relations | boolean | false | Follow graph edges from seed results |
expand_entities | boolean | false | Multi-hop reasoning via entity tags |
relation_limit | integer | 5 | Max relations per seed (from RECALL_RELATION_LIMIT) |
expansion_limit | integer | 25 | Total max expanded memories (from RECALL_EXPANSION_LIMIT) |
expand_min_importance | float | 0.0 | Minimum importance threshold for expanded memories |
expand_min_strength | float | 0.0 | Minimum relation strength to traverse |
Graph expansion operates in two phases: seed results from hybrid search, then expansion via graph traversal. Expansion filtering parameters only apply to expanded memories, never to seed results.
Hybrid Search Architecture
Section titled “Hybrid Search Architecture”graph LR
Request["GET /recall<br/>?query=X&tags=Y"]
subgraph "Search Strategies"
Vector["_vector_search<br/>app.py:924"]
Keyword["_graph_keyword_search<br/>app.py:721"]
Trending["_graph_trending_results<br/>app.py:669"]
end
subgraph "Scoring"
Compute["_compute_metadata_score<br/>app.py:476"]
Combine["Weighted combination<br/>vector+keyword+tag+<br/>importance+recency"]
end
subgraph "Filtering"
Time["Time window filter<br/>_parse_time_expression"]
Tags["Tag filter<br/>_build_graph_tag_predicate"]
ResultFilter["_result_passes_filters"]
end
Request --> Vector
Request --> Keyword
Request --> Trending
Vector --> Combine
Keyword --> Combine
Trending --> Combine
Combine --> Time
Time --> Tags
Tags --> ResultFilter
ResultFilter --> Response["Ranked results<br/>with relations"]
Search strategy:
- Vector Search (
state.qdrant != None): Semantic similarity using OpenAI embeddings - Keyword Search (FalkorDB): Content and tag matching using Cypher
CONTAINS - Trending Results (no query): High-importance memories ordered by recency
- Deduplication: Track seen IDs across sources
- Filtering: Apply temporal and tag constraints
- Scoring: Combine weighted factors into final score
- Relations: Fetch connected memories (limited by
RECALL_RELATION_LIMIT)
Hybrid Scoring System
Section titled “Hybrid Scoring System”9-Component Score Calculation
Section titled “9-Component Score Calculation”graph TB
Query["Query Input"]
subgraph VectorSearch["Vector Search (Qdrant)"]
VectorComp["Vector Component<br/>Weight: SEARCH_WEIGHT_VECTOR<br/>Cosine similarity score"]
end
subgraph KeywordSearch["Keyword Search (FalkorDB)"]
KeywordComp["Keyword Component<br/>Weight: SEARCH_WEIGHT_KEYWORD<br/>TF-IDF + phrase matching"]
ExactComp["Exact Component<br/>Weight: SEARCH_WEIGHT_EXACT<br/>Direct content overlap"]
end
subgraph MetadataScoring["Metadata Scoring"]
ImportanceComp["Importance Component<br/>Weight: SEARCH_WEIGHT_IMPORTANCE<br/>User-assigned priority"]
ConfidenceComp["Confidence Component<br/>Weight: SEARCH_WEIGHT_CONFIDENCE<br/>Classification confidence"]
RecencyComp["Recency Component<br/>Weight: SEARCH_WEIGHT_RECENCY<br/>Freshness boost"]
TagComp["Tag Component<br/>Weight: SEARCH_WEIGHT_TAG<br/>Tag overlap score"]
end
subgraph ContextScoring["Context Scoring"]
RelationComp["Relation Component<br/>Computed from graph edges"]
TemporalComp["Temporal Component<br/>Time alignment score"]
end
FinalScore["Final Weighted Score<br/>Sum of all components"]
Query --> VectorSearch
Query --> KeywordSearch
Query --> MetadataScoring
Query --> ContextScoring
VectorComp --> FinalScore
KeywordComp --> FinalScore
ExactComp --> FinalScore
ImportanceComp --> FinalScore
ConfidenceComp --> FinalScore
RecencyComp --> FinalScore
TagComp --> FinalScore
RelationComp --> FinalScore
TemporalComp --> FinalScore
The formula:
final_score = (vector_score × SEARCH_WEIGHT_VECTOR) + (keyword_score × SEARCH_WEIGHT_KEYWORD) + (exact_match_score × SEARCH_WEIGHT_EXACT) + (importance × SEARCH_WEIGHT_IMPORTANCE) + (confidence × SEARCH_WEIGHT_CONFIDENCE) + (recency_score × SEARCH_WEIGHT_RECENCY) + (tag_score × SEARCH_WEIGHT_TAG) + relation_score + temporal_scoreDefault weight configuration (all configurable via environment variables):
| Component | Environment Variable | Default Value |
|---|---|---|
| Vector | SEARCH_WEIGHT_VECTOR | 0.25 |
| Keyword | SEARCH_WEIGHT_KEYWORD | 0.15 |
| Exact | SEARCH_WEIGHT_EXACT | 0.25 |
| Importance | SEARCH_WEIGHT_IMPORTANCE | 0.05 |
| Confidence | SEARCH_WEIGHT_CONFIDENCE | 0.05 |
| Recency | SEARCH_WEIGHT_RECENCY | 0.10 |
| Tag | SEARCH_WEIGHT_TAG | 0.10 |
Scoring Component Details
Section titled “Scoring Component Details”Vector Search Component:
Executes via _vector_search() using Qdrant when available. When Qdrant is unavailable, the system gracefully degrades to graph-only mode using keyword and metadata scoring.
Keyword Search Component:
Implemented in _graph_keyword_search() using Cypher queries:
- Extracts keywords via
_extract_keywords()fromautomem/utils/text.py - Matches against
m.contentandm.tagsfields in FalkorDB - Scores based on keyword frequency and phrase containment
- Returns memories ordered by
score DESC, m.importance DESC, m.timestamp DESC - Content match: 2 points per keyword
- Tag match: 1 point per keyword
- Exact phrase bonus: +2 (content) or +1 (tag)
Metadata Components:
Computed by _compute_metadata_score() and _parse_metadata_field():
- Importance: Direct multiplication by weight (0.0–1.0 range)
- Confidence: Classification confidence from memory type detection
- Recency:
max(0, 1 - (age_days / 180))— 6-month exponential decay based on time since last access - Tag:
matched_tokens / total_query_tokens— overlap ratio between query tags and memory tags
Graph Expansion
Section titled “Graph Expansion”Relationship Expansion Flow
Section titled “Relationship Expansion Flow”graph TB
SeedResults["Seed Results<br/>from Hybrid Search"]
CheckExpand{"expand_relations<br/>enabled?"}
FetchRelations["_fetch_relations()<br/>app.py:2200-2300"]
FilterStrength{"Relation strength >=<br/>expand_min_strength?"}
FilterImportance{"Memory importance >=<br/>expand_min_importance?"}
TraverseEdge["Traverse edge<br/>Load related memory"]
CheckLimit{"Total expanded <br/>expansion_limit?"}
AddToResults["Add to expanded results"]
FinalResults["Merged Results<br/>Seed + Expanded"]
SeedResults --> CheckExpand
CheckExpand -->|No| FinalResults
CheckExpand -->|Yes| FetchRelations
FetchRelations --> FilterStrength
FilterStrength -->|Fail| FinalResults
FilterStrength -->|Pass| TraverseEdge
TraverseEdge --> FilterImportance
FilterImportance -->|Fail| FinalResults
FilterImportance -->|Pass| CheckLimit
CheckLimit -->|Exceeded| FinalResults
CheckLimit -->|Within limit| AddToResults
AddToResults --> FinalResults
The _expand_related_memories() function implements multi-hop graph traversal using Cypher MATCH patterns to follow typed relationship edges (e.g., RELATES_TO, PREFERS_OVER, LEADS_TO).
Key configuration constants:
RECALL_RELATION_LIMIT(default: 5) — Max edges per seed memoryRECALL_EXPANSION_LIMIT(default: 25) — Total expanded memories cap
Edge types traversed: RELATES_TO, LEADS_TO, DERIVED_FROM, EVOLVED_INTO, REINFORCES, EXEMPLIFIES, and all 11 relationship types. See Relationship Operations for the full type reference.
Entity Expansion Flow
Section titled “Entity Expansion Flow”When expand_entities=true, the system performs multi-hop reasoning via entity tags:
- Extract entity tags from query (format:
entity:<type>:<slug>) - Find memories with matching entity tags (first hop)
- Extract entities from matched memories
- Find memories containing those entities (second hop)
Example: Query "What is Sarah's sister's job?" → Find "Sarah" entity → Find Sarah’s relationships → Find "sister Rachel" → Find Rachel’s job.
Auto Query Decomposition
Section titled “Auto Query Decomposition”When auto_decompose: true, the backend automatically extracts entities and topics from the query to generate supplementary searches:
- Query:
"React OAuth authentication patterns" - Decomposed into:
["React", "OAuth", "authentication", "patterns"] - Executes parallel searches for each term
- Merges and deduplicates results
Multiple Query Deduplication
Section titled “Multiple Query Deduplication”The queries parameter (array of strings) allows multiple search queries in a single request. The backend executes them in parallel and deduplicates results server-side.
Deduplication logic:
- Each query returns up to
limitresults - Server merges results by
memory_id - Keeps highest-scored version of each memory
- Returns
dedup_removedcount in response metadata
Parallel Query Optimization (MCP Layer)
Section titled “Parallel Query Optimization (MCP Layer)”The MCP recall_memory tool implements a performance optimization when tags are provided: it executes both the primary recall endpoint and a tag-only endpoint in parallel, then merges and deduplicates results.
Implementation:
- Tag detection: If
Array.isArray(recallArgs.tags) && recallArgs.tags.length > 0 - Parallel execution: Uses
Promise.all()to fetch both endpoints - Merge logic: Creates a
Map<memory_id, result>to deduplicate - Sorting: Sorts by
final_scoredescending, falls back toimportance - Limit enforcement: Slices to
recallArgs.limitafter merge
Rationale: Tag-filtered semantic search might miss high-importance memories that only match by tag, so both strategies run concurrently for comprehensive recall.
Query Execution Flow
Section titled “Query Execution Flow”sequenceDiagram
participant Client
participant API as "Flask API"
participant Graph as "FalkorDB"
participant Vector as "Qdrant"
participant Scorer as "Hybrid Scorer"
Client->>API: "GET /recall?query=...&expand_relations=true"
par Vector Search
API->>Vector: "search(embedding, limit)"
Vector-->>API: "vector_candidates"
and Graph Search
API->>Graph: "keyword search Cypher"
Graph-->>API: "graph_candidates"
end
API->>Scorer: "merge + rank candidates"
Scorer-->>API: "ranked_results"
opt Expand Relations
API->>Graph: "MATCH relationships"
Graph-->>API: "expanded_memories"
API->>API: "filter by importance/strength"
end
API->>Graph: "SET m.last_accessed"
API-->>Client: "final results + metadata"
Typical performance:
- Sub-100ms for queries without expansion
- 100–300ms with graph expansion enabled
- Scales to 100k+ memories with proper indexing
Response Format
Section titled “Response Format”Standard Response Structure
Section titled “Standard Response Structure”{ "count": 3, "results": [ { "memory": { "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "content": "Chose PostgreSQL over MongoDB...", "tags": ["project-alpha", "database"], "importance": 0.9, "created_at": "2025-01-15T10:30:00Z" }, "final_score": 0.847, "match_type": "semantic", "score_components": { "vector": 0.82, "keyword": 0.15, "importance": 0.9, "recency": 0.71 }, "relations": [] } ], "dedup_removed": 0, "time_window": null, "tags": []}Result fields:
memory: The stored memory object withmemory_id,content,tags,importance,created_atfinal_score: Combined relevance scorematch_type: Type of match —semantic,keyword,tag,relation, orentityscore_components: Breakdown of scoring factorsrelations: Connected memories (if expansion enabled)expanded_from_entity: Entity that triggered expansion (if applicable)deduped_from: IDs of duplicate results removed
Expansion Response Fields
Section titled “Expansion Response Fields”When expand_relations=true:
{ "expansion": { "expanded_count": 5, "seed_count": 3 }}When expand_entities=true:
{ "entity_expansion": { "expanded_count": 4, "entities_found": ["Rachel", "Platform team"] }}Usage Examples
Section titled “Usage Examples”Basic Text Search
Section titled “Basic Text Search”curl "https://your-automem-instance/recall?query=PostgreSQL+database+decisions&limit=5" \ -H "Authorization: Bearer YOUR_TOKEN"Semantic Search with Pre-computed Vector
Section titled “Semantic Search with Pre-computed Vector”curl "https://your-automem-instance/recall?embedding=0.1,0.2,0.3,...&limit=10" \ -H "Authorization: Bearer YOUR_TOKEN"Time-Filtered Search
Section titled “Time-Filtered Search”# Natural language time filtercurl "https://your-automem-instance/recall?time_query=last+week&query=bug+fixes" \ -H "Authorization: Bearer YOUR_TOKEN"
# Explicit time rangecurl "https://your-automem-instance/recall?start=2025-01-01T00:00:00Z&end=2025-01-31T23:59:59Z" \ -H "Authorization: Bearer YOUR_TOKEN"Tag-Based Filtering
Section titled “Tag-Based Filtering”# Any tag match (default)curl "https://your-automem-instance/recall?tags=project-alpha&tags=database" \ -H "Authorization: Bearer YOUR_TOKEN"
# All tags requiredcurl "https://your-automem-instance/recall?tags=project-alpha&tags=database&tag_mode=all" \ -H "Authorization: Bearer YOUR_TOKEN"
# Exclude certain tagscurl "https://your-automem-instance/recall?query=preferences&exclude_tags=conversation&exclude_tags=temp" \ -H "Authorization: Bearer YOUR_TOKEN"Context-Aware Coding Search
Section titled “Context-Aware Coding Search”curl "https://your-automem-instance/recall?query=error+handling&active_path=src/auth.ts&context_types=Style&context_types=Pattern" \ -H "Authorization: Bearer YOUR_TOKEN"Auto-detects TypeScript context → boosts Style type memories → prioritizes language-specific patterns.
Graph Expansion with Filtering
Section titled “Graph Expansion with Filtering”curl "https://your-automem-instance/recall?query=authentication&expand_relations=true&expand_min_strength=0.3&expand_min_importance=0.5" \ -H "Authorization: Bearer YOUR_TOKEN"Performs hybrid search for seed results → follows graph edges with strength ≥ 0.3 → filters expanded memories with importance ≥ 0.5.
Multi-Hop Entity Search
Section titled “Multi-Hop Entity Search”curl "https://your-automem-instance/recall?query=Sarah%27s+sister%27s+job&expand_entities=true" \ -H "Authorization: Bearer YOUR_TOKEN"Sorting Modes
Section titled “Sorting Modes”Score-Based Sorting (Default)
Section titled “Score-Based Sorting (Default)”When sort=score (or unspecified with query), results are ordered by the final weighted hybrid score combining all 9 components optimized for relevance.
Time-Based Sorting
Section titled “Time-Based Sorting”Time-based sorting modes are designed for “what happened since X” queries:
time_desc/updated_desc: Newest first (ordered bycoalesce(m.updated_at, m.timestamp))time_asc/updated_asc: Oldest first
When a time window is specified without a query, the system defaults to time_desc to show recent activity.
MCP Tool: recall_memory
Section titled “MCP Tool: recall_memory”The recall_memory MCP tool wraps GET /recall with additional client-side optimizations. It is annotated readOnlyHint: true and idempotentHint: true.
Full input schema:
| Parameter | Type | Required | Constraints | Description |
|---|---|---|---|---|
query | string | No | — | Semantic search query |
queries | array[string] | No | — | Multiple queries for broader recall |
limit | integer | No | 1–50, default 5 | Max results to return |
tags | array[string] | No | — | Filter by tags |
tag_mode | string | No | "any" or "all" | Tag matching mode |
tag_match | string | No | "exact" or "prefix" | Tag matching strategy |
time_query | string | No | — | Natural language time filter |
start | string | No | ISO timestamp | Time range start |
end | string | No | ISO timestamp | Time range end |
expand_entities | boolean | No | — | Enable multi-hop entity expansion |
expand_relations | boolean | No | — | Follow graph relationships |
expansion_limit | integer | No | 1–500, default 25 | Max expanded results |
relation_limit | integer | No | 1–200, default 5 | Relations per seed memory |
expand_min_importance | number | No | 0–1 | Filter expanded results by importance |
expand_min_strength | number | No | 0–1 | Min relation strength to traverse |
context | string | No | — | Context label for boosting |
language | string | No | — | Programming language hint |
active_path | string | No | — | Current file path for language detection |
context_tags | array[string] | No | — | Priority tags to boost |
context_types | array[string] | No | — | Priority memory types to boost |
priority_ids | array[string] | No | — | Specific memory IDs to always include |
auto_decompose | boolean | No | — | Auto-extract entities from query for parallel searches |
Output schema:
| Field | Type | Required | Description |
|---|---|---|---|
count | integer | Yes | Number of memories returned |
results | array | Yes | Array of memory objects with scores |
dedup_removed | integer | No | Duplicates removed in multi-query |
expansion | object | No | Graph expansion statistics |
entity_expansion | object | No | Entity expansion statistics |
time_window | object | No | Applied time filter bounds |
tags | string[] | No | Applied tag filters |
MCP Response Formatting
Section titled “MCP Response Formatting”Each memory is formatted as a numbered list item for the AI platform:
1. [content] [tags] (importance: X) score=X.XXX [match_type] relations=X [via entity: Y] (deduped x2) ID: mem-abc123 Created: 2025-01-15T10:30:00ZA summary line includes statistics:
(3 duplicates removed; 5 via entity expansion (Rachel, Platform team); 2 via relation expansion)The MCP response also includes structuredContent with the full RecallResult object for programmatic consumption.
Common MCP Patterns
Section titled “Common MCP Patterns”Session start — load recent project context:
{ "queries": ["recent decisions", "user preferences"], "tags": ["preference"], "time_query": "last 30 days", "limit": 10}Preference recall (no time filter needed):
{ "tags": ["preference"], "limit": 10}Debug pattern search:
{ "query": "authentication timeout error", "tags": ["bug-fix", "auth"], "limit": 5}Multi-hop reasoning:
{ "query": "What does Amanda's sister do?", "expand_entities": true, "expansion_limit": 10}Filtered graph exploration:
{ "query": "database architecture", "expand_relations": true, "expand_min_strength": 0.5, "expand_min_importance": 0.7, "relation_limit": 3}Error Handling
Section titled “Error Handling”Service Degradation
Section titled “Service Degradation”| Condition | Behavior | HTTP Status |
|---|---|---|
| FalkorDB unavailable | Returns 503 error | 503 |
| Qdrant unavailable | Graph-only mode (keyword + metadata) | 200 |
| Embedding generation fails | Falls back to keyword search | 200 |
| No query and no tags | Returns trending memories | 200 |
The system implements graceful degradation where vector search failures don’t block graph operations.
Validation Errors
Section titled “Validation Errors”| Error | HTTP Status | Condition |
|---|---|---|
| Invalid embedding dimensions | 400 | Vector size doesn’t match VECTOR_SIZE |
| Invalid time query | 400 | Unparseable time expression |
| Limit exceeds maximum | 200 | Clamped to RECALL_MAX_LIMIT |
| Missing authentication | 401 | No valid token provided |
Performance Optimization Techniques
Section titled “Performance Optimization Techniques”- Tag Prefix Indexing: Pre-computed
tag_prefixesenable O(1) prefix matching - Parallel Search: Vector and graph searches execute concurrently
- LRU Caching: Entity extraction results cached (80% speedup)
- Access Tracking:
last_accessedupdates happen asynchronously