Skip to content

Recall Operations

The recall system provides a single unified endpoint that supports multiple search strategies. It combines nine scoring components into a hybrid ranking system and supports both basic retrieval and advanced graph expansion. Authentication is required via Authorization: Bearer <token> header or X-API-Key header.

GET /recall

ParameterTypeDefaultDescription
querystringNatural language search query or wildcard *
embeddingarray[float]Pre-computed embedding vector for semantic search
limitinteger10Maximum results to return (capped at RECALL_MAX_LIMIT=100)
sort / order_bystringscoreSort mode: score, time_desc, time_asc, updated_desc, updated_asc

The query parameter triggers keyword extraction and full-text search. When query="*" or is omitted, the system returns trending memories ordered by importance.

ParameterTypeDefaultDescription
tagsarray[string]Tags to filter by (can specify multiple)
exclude_tagsarray[string]Tags to exclude (supports prefix matching)
tag_modestringanyMatch mode: any or all
tag_matchstringprefixMatch type: prefix or exact

Tag filtering uses pre-computed tag_prefixes stored in both FalkorDB and Qdrant for efficient prefix matching. The exclude_tags parameter removes any memory containing ANY of the specified tags.

Prefix matching example:

  • Query: tags=slack
  • Matches: slack:channel:general, slack:user:U123, slack:reaction:thumbs-up
  • Uses precomputed tag_prefixes field for O(1) lookup

Mode behavior:

  • any (default): Match if ANY filter tag present
  • all: Match only if ALL filter tags present
ParameterTypeDefaultDescription
time_querystringNatural language time expression (e.g., "last week")
startstringISO 8601 start timestamp
endstringISO 8601 end timestamp

Time queries are parsed by _parse_time_expression() from automem/utils/time.py, which supports expressions like "last 7 days", "yesterday", "last month". Explicit timestamps override natural language queries.

ParameterTypeDefaultDescription
contextstringContextual hint for ranking (e.g., "programming task")
languagestringProgramming language context (e.g., "python")
active_pathstringFile path being edited (auto-detects language)
context_tagsarray[string]Tags to boost in scoring
context_typesarray[string]Memory types to prioritize
priority_idsarray[string]Specific memory IDs to guarantee inclusion

Context hints influence the 9-component scoring system but do not filter results. When active_path matches a coding language extension, the system prioritizes Style type memories for that language.

ParameterTypeDefaultDescription
expand_relationsbooleanfalseFollow graph edges from seed results
expand_entitiesbooleanfalseMulti-hop reasoning via entity tags
relation_limitinteger5Max relations per seed (from RECALL_RELATION_LIMIT)
expansion_limitinteger25Total max expanded memories (from RECALL_EXPANSION_LIMIT)
expand_min_importancefloat0.0Minimum importance threshold for expanded memories
expand_min_strengthfloat0.0Minimum relation strength to traverse

Graph expansion operates in two phases: seed results from hybrid search, then expansion via graph traversal. Expansion filtering parameters only apply to expanded memories, never to seed results.


graph LR
    Request["GET /recall<br/>?query=X&tags=Y"]

    subgraph "Search Strategies"
        Vector["_vector_search<br/>app.py:924"]
        Keyword["_graph_keyword_search<br/>app.py:721"]
        Trending["_graph_trending_results<br/>app.py:669"]
    end

    subgraph "Scoring"
        Compute["_compute_metadata_score<br/>app.py:476"]
        Combine["Weighted combination<br/>vector+keyword+tag+<br/>importance+recency"]
    end

    subgraph "Filtering"
        Time["Time window filter<br/>_parse_time_expression"]
        Tags["Tag filter<br/>_build_graph_tag_predicate"]
        ResultFilter["_result_passes_filters"]
    end

    Request --> Vector
    Request --> Keyword
    Request --> Trending

    Vector --> Combine
    Keyword --> Combine
    Trending --> Combine

    Combine --> Time
    Time --> Tags
    Tags --> ResultFilter

    ResultFilter --> Response["Ranked results<br/>with relations"]

Search strategy:

  1. Vector Search (state.qdrant != None): Semantic similarity using OpenAI embeddings
  2. Keyword Search (FalkorDB): Content and tag matching using Cypher CONTAINS
  3. Trending Results (no query): High-importance memories ordered by recency
  4. Deduplication: Track seen IDs across sources
  5. Filtering: Apply temporal and tag constraints
  6. Scoring: Combine weighted factors into final score
  7. Relations: Fetch connected memories (limited by RECALL_RELATION_LIMIT)

graph TB
    Query["Query Input"]

    subgraph VectorSearch["Vector Search (Qdrant)"]
        VectorComp["Vector Component<br/>Weight: SEARCH_WEIGHT_VECTOR<br/>Cosine similarity score"]
    end

    subgraph KeywordSearch["Keyword Search (FalkorDB)"]
        KeywordComp["Keyword Component<br/>Weight: SEARCH_WEIGHT_KEYWORD<br/>TF-IDF + phrase matching"]
        ExactComp["Exact Component<br/>Weight: SEARCH_WEIGHT_EXACT<br/>Direct content overlap"]
    end

    subgraph MetadataScoring["Metadata Scoring"]
        ImportanceComp["Importance Component<br/>Weight: SEARCH_WEIGHT_IMPORTANCE<br/>User-assigned priority"]
        ConfidenceComp["Confidence Component<br/>Weight: SEARCH_WEIGHT_CONFIDENCE<br/>Classification confidence"]
        RecencyComp["Recency Component<br/>Weight: SEARCH_WEIGHT_RECENCY<br/>Freshness boost"]
        TagComp["Tag Component<br/>Weight: SEARCH_WEIGHT_TAG<br/>Tag overlap score"]
    end

    subgraph ContextScoring["Context Scoring"]
        RelationComp["Relation Component<br/>Computed from graph edges"]
        TemporalComp["Temporal Component<br/>Time alignment score"]
    end

    FinalScore["Final Weighted Score<br/>Sum of all components"]

    Query --> VectorSearch
    Query --> KeywordSearch
    Query --> MetadataScoring
    Query --> ContextScoring

    VectorComp --> FinalScore
    KeywordComp --> FinalScore
    ExactComp --> FinalScore
    ImportanceComp --> FinalScore
    ConfidenceComp --> FinalScore
    RecencyComp --> FinalScore
    TagComp --> FinalScore
    RelationComp --> FinalScore
    TemporalComp --> FinalScore

The formula:

final_score = (vector_score × SEARCH_WEIGHT_VECTOR) +
(keyword_score × SEARCH_WEIGHT_KEYWORD) +
(exact_match_score × SEARCH_WEIGHT_EXACT) +
(importance × SEARCH_WEIGHT_IMPORTANCE) +
(confidence × SEARCH_WEIGHT_CONFIDENCE) +
(recency_score × SEARCH_WEIGHT_RECENCY) +
(tag_score × SEARCH_WEIGHT_TAG) +
relation_score +
temporal_score

Default weight configuration (all configurable via environment variables):

ComponentEnvironment VariableDefault Value
VectorSEARCH_WEIGHT_VECTOR0.25
KeywordSEARCH_WEIGHT_KEYWORD0.15
ExactSEARCH_WEIGHT_EXACT0.25
ImportanceSEARCH_WEIGHT_IMPORTANCE0.05
ConfidenceSEARCH_WEIGHT_CONFIDENCE0.05
RecencySEARCH_WEIGHT_RECENCY0.10
TagSEARCH_WEIGHT_TAG0.10

Vector Search Component:

Executes via _vector_search() using Qdrant when available. When Qdrant is unavailable, the system gracefully degrades to graph-only mode using keyword and metadata scoring.

Keyword Search Component:

Implemented in _graph_keyword_search() using Cypher queries:

  • Extracts keywords via _extract_keywords() from automem/utils/text.py
  • Matches against m.content and m.tags fields in FalkorDB
  • Scores based on keyword frequency and phrase containment
  • Returns memories ordered by score DESC, m.importance DESC, m.timestamp DESC
  • Content match: 2 points per keyword
  • Tag match: 1 point per keyword
  • Exact phrase bonus: +2 (content) or +1 (tag)

Metadata Components:

Computed by _compute_metadata_score() and _parse_metadata_field():

  • Importance: Direct multiplication by weight (0.0–1.0 range)
  • Confidence: Classification confidence from memory type detection
  • Recency: max(0, 1 - (age_days / 180)) — 6-month exponential decay based on time since last access
  • Tag: matched_tokens / total_query_tokens — overlap ratio between query tags and memory tags

graph TB
    SeedResults["Seed Results<br/>from Hybrid Search"]

    CheckExpand{"expand_relations<br/>enabled?"}

    FetchRelations["_fetch_relations()<br/>app.py:2200-2300"]

    FilterStrength{"Relation strength >=<br/>expand_min_strength?"}

    FilterImportance{"Memory importance >=<br/>expand_min_importance?"}

    TraverseEdge["Traverse edge<br/>Load related memory"]

    CheckLimit{"Total expanded <br/>expansion_limit?"}

    AddToResults["Add to expanded results"]

    FinalResults["Merged Results<br/>Seed + Expanded"]

    SeedResults --> CheckExpand
    CheckExpand -->|No| FinalResults
    CheckExpand -->|Yes| FetchRelations

    FetchRelations --> FilterStrength
    FilterStrength -->|Fail| FinalResults
    FilterStrength -->|Pass| TraverseEdge

    TraverseEdge --> FilterImportance
    FilterImportance -->|Fail| FinalResults
    FilterImportance -->|Pass| CheckLimit

    CheckLimit -->|Exceeded| FinalResults
    CheckLimit -->|Within limit| AddToResults
    AddToResults --> FinalResults

The _expand_related_memories() function implements multi-hop graph traversal using Cypher MATCH patterns to follow typed relationship edges (e.g., RELATES_TO, PREFERS_OVER, LEADS_TO).

Key configuration constants:

  • RECALL_RELATION_LIMIT (default: 5) — Max edges per seed memory
  • RECALL_EXPANSION_LIMIT (default: 25) — Total expanded memories cap

Edge types traversed: RELATES_TO, LEADS_TO, DERIVED_FROM, EVOLVED_INTO, REINFORCES, EXEMPLIFIES, and all 11 relationship types. See Relationship Operations for the full type reference.

When expand_entities=true, the system performs multi-hop reasoning via entity tags:

  1. Extract entity tags from query (format: entity:<type>:<slug>)
  2. Find memories with matching entity tags (first hop)
  3. Extract entities from matched memories
  4. Find memories containing those entities (second hop)

Example: Query "What is Sarah's sister's job?" → Find "Sarah" entity → Find Sarah’s relationships → Find "sister Rachel" → Find Rachel’s job.

When auto_decompose: true, the backend automatically extracts entities and topics from the query to generate supplementary searches:

  • Query: "React OAuth authentication patterns"
  • Decomposed into: ["React", "OAuth", "authentication", "patterns"]
  • Executes parallel searches for each term
  • Merges and deduplicates results

The queries parameter (array of strings) allows multiple search queries in a single request. The backend executes them in parallel and deduplicates results server-side.

Deduplication logic:

  1. Each query returns up to limit results
  2. Server merges results by memory_id
  3. Keeps highest-scored version of each memory
  4. Returns dedup_removed count in response metadata

The MCP recall_memory tool implements a performance optimization when tags are provided: it executes both the primary recall endpoint and a tag-only endpoint in parallel, then merges and deduplicates results.

Implementation:

  1. Tag detection: If Array.isArray(recallArgs.tags) && recallArgs.tags.length > 0
  2. Parallel execution: Uses Promise.all() to fetch both endpoints
  3. Merge logic: Creates a Map<memory_id, result> to deduplicate
  4. Sorting: Sorts by final_score descending, falls back to importance
  5. Limit enforcement: Slices to recallArgs.limit after merge

Rationale: Tag-filtered semantic search might miss high-importance memories that only match by tag, so both strategies run concurrently for comprehensive recall.


sequenceDiagram
    participant Client
    participant API as "Flask API"
    participant Graph as "FalkorDB"
    participant Vector as "Qdrant"
    participant Scorer as "Hybrid Scorer"

    Client->>API: "GET /recall?query=...&expand_relations=true"

    par Vector Search
        API->>Vector: "search(embedding, limit)"
        Vector-->>API: "vector_candidates"
    and Graph Search
        API->>Graph: "keyword search Cypher"
        Graph-->>API: "graph_candidates"
    end

    API->>Scorer: "merge + rank candidates"
    Scorer-->>API: "ranked_results"

    opt Expand Relations
        API->>Graph: "MATCH relationships"
        Graph-->>API: "expanded_memories"
        API->>API: "filter by importance/strength"
    end

    API->>Graph: "SET m.last_accessed"
    API-->>Client: "final results + metadata"

Typical performance:

  • Sub-100ms for queries without expansion
  • 100–300ms with graph expansion enabled
  • Scales to 100k+ memories with proper indexing

{
"count": 3,
"results": [
{
"memory": {
"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"content": "Chose PostgreSQL over MongoDB...",
"tags": ["project-alpha", "database"],
"importance": 0.9,
"created_at": "2025-01-15T10:30:00Z"
},
"final_score": 0.847,
"match_type": "semantic",
"score_components": {
"vector": 0.82,
"keyword": 0.15,
"importance": 0.9,
"recency": 0.71
},
"relations": []
}
],
"dedup_removed": 0,
"time_window": null,
"tags": []
}

Result fields:

  • memory: The stored memory object with memory_id, content, tags, importance, created_at
  • final_score: Combined relevance score
  • match_type: Type of match — semantic, keyword, tag, relation, or entity
  • score_components: Breakdown of scoring factors
  • relations: Connected memories (if expansion enabled)
  • expanded_from_entity: Entity that triggered expansion (if applicable)
  • deduped_from: IDs of duplicate results removed

When expand_relations=true:

{
"expansion": {
"expanded_count": 5,
"seed_count": 3
}
}

When expand_entities=true:

{
"entity_expansion": {
"expanded_count": 4,
"entities_found": ["Rachel", "Platform team"]
}
}

Terminal window
curl "https://your-automem-instance/recall?query=PostgreSQL+database+decisions&limit=5" \
-H "Authorization: Bearer YOUR_TOKEN"
Terminal window
curl "https://your-automem-instance/recall?embedding=0.1,0.2,0.3,...&limit=10" \
-H "Authorization: Bearer YOUR_TOKEN"
Terminal window
# Natural language time filter
curl "https://your-automem-instance/recall?time_query=last+week&query=bug+fixes" \
-H "Authorization: Bearer YOUR_TOKEN"
# Explicit time range
curl "https://your-automem-instance/recall?start=2025-01-01T00:00:00Z&end=2025-01-31T23:59:59Z" \
-H "Authorization: Bearer YOUR_TOKEN"
Terminal window
# Any tag match (default)
curl "https://your-automem-instance/recall?tags=project-alpha&tags=database" \
-H "Authorization: Bearer YOUR_TOKEN"
# All tags required
curl "https://your-automem-instance/recall?tags=project-alpha&tags=database&tag_mode=all" \
-H "Authorization: Bearer YOUR_TOKEN"
# Exclude certain tags
curl "https://your-automem-instance/recall?query=preferences&exclude_tags=conversation&exclude_tags=temp" \
-H "Authorization: Bearer YOUR_TOKEN"
Terminal window
curl "https://your-automem-instance/recall?query=error+handling&active_path=src/auth.ts&context_types=Style&context_types=Pattern" \
-H "Authorization: Bearer YOUR_TOKEN"

Auto-detects TypeScript context → boosts Style type memories → prioritizes language-specific patterns.

Terminal window
curl "https://your-automem-instance/recall?query=authentication&expand_relations=true&expand_min_strength=0.3&expand_min_importance=0.5" \
-H "Authorization: Bearer YOUR_TOKEN"

Performs hybrid search for seed results → follows graph edges with strength ≥ 0.3 → filters expanded memories with importance ≥ 0.5.

Terminal window
curl "https://your-automem-instance/recall?query=Sarah%27s+sister%27s+job&expand_entities=true" \
-H "Authorization: Bearer YOUR_TOKEN"

When sort=score (or unspecified with query), results are ordered by the final weighted hybrid score combining all 9 components optimized for relevance.

Time-based sorting modes are designed for “what happened since X” queries:

  • time_desc / updated_desc: Newest first (ordered by coalesce(m.updated_at, m.timestamp))
  • time_asc / updated_asc: Oldest first

When a time window is specified without a query, the system defaults to time_desc to show recent activity.


The recall_memory MCP tool wraps GET /recall with additional client-side optimizations. It is annotated readOnlyHint: true and idempotentHint: true.

Full input schema:

ParameterTypeRequiredConstraintsDescription
querystringNoSemantic search query
queriesarray[string]NoMultiple queries for broader recall
limitintegerNo1–50, default 5Max results to return
tagsarray[string]NoFilter by tags
tag_modestringNo"any" or "all"Tag matching mode
tag_matchstringNo"exact" or "prefix"Tag matching strategy
time_querystringNoNatural language time filter
startstringNoISO timestampTime range start
endstringNoISO timestampTime range end
expand_entitiesbooleanNoEnable multi-hop entity expansion
expand_relationsbooleanNoFollow graph relationships
expansion_limitintegerNo1–500, default 25Max expanded results
relation_limitintegerNo1–200, default 5Relations per seed memory
expand_min_importancenumberNo0–1Filter expanded results by importance
expand_min_strengthnumberNo0–1Min relation strength to traverse
contextstringNoContext label for boosting
languagestringNoProgramming language hint
active_pathstringNoCurrent file path for language detection
context_tagsarray[string]NoPriority tags to boost
context_typesarray[string]NoPriority memory types to boost
priority_idsarray[string]NoSpecific memory IDs to always include
auto_decomposebooleanNoAuto-extract entities from query for parallel searches

Output schema:

FieldTypeRequiredDescription
countintegerYesNumber of memories returned
resultsarrayYesArray of memory objects with scores
dedup_removedintegerNoDuplicates removed in multi-query
expansionobjectNoGraph expansion statistics
entity_expansionobjectNoEntity expansion statistics
time_windowobjectNoApplied time filter bounds
tagsstring[]NoApplied tag filters

Each memory is formatted as a numbered list item for the AI platform:

1. [content] [tags] (importance: X) score=X.XXX [match_type] relations=X [via entity: Y] (deduped x2)
ID: mem-abc123
Created: 2025-01-15T10:30:00Z

A summary line includes statistics:

(3 duplicates removed; 5 via entity expansion (Rachel, Platform team); 2 via relation expansion)

The MCP response also includes structuredContent with the full RecallResult object for programmatic consumption.

Session start — load recent project context:

{
"queries": ["recent decisions", "user preferences"],
"tags": ["preference"],
"time_query": "last 30 days",
"limit": 10
}

Preference recall (no time filter needed):

{
"tags": ["preference"],
"limit": 10
}

Debug pattern search:

{
"query": "authentication timeout error",
"tags": ["bug-fix", "auth"],
"limit": 5
}

Multi-hop reasoning:

{
"query": "What does Amanda's sister do?",
"expand_entities": true,
"expansion_limit": 10
}

Filtered graph exploration:

{
"query": "database architecture",
"expand_relations": true,
"expand_min_strength": 0.5,
"expand_min_importance": 0.7,
"relation_limit": 3
}

ConditionBehaviorHTTP Status
FalkorDB unavailableReturns 503 error503
Qdrant unavailableGraph-only mode (keyword + metadata)200
Embedding generation failsFalls back to keyword search200
No query and no tagsReturns trending memories200

The system implements graceful degradation where vector search failures don’t block graph operations.

ErrorHTTP StatusCondition
Invalid embedding dimensions400Vector size doesn’t match VECTOR_SIZE
Invalid time query400Unparseable time expression
Limit exceeds maximum200Clamped to RECALL_MAX_LIMIT
Missing authentication401No valid token provided

  1. Tag Prefix Indexing: Pre-computed tag_prefixes enable O(1) prefix matching
  2. Parallel Search: Vector and graph searches execute concurrently
  3. LRU Caching: Entity extraction results cached (80% speedup)
  4. Access Tracking: last_accessed updates happen asynchronously