QMD – Quick Markdown Search: A Technical Deep-Dive into the Open-Source Local Search Engine

Author’s note: How does it work? https://github.com/tobi/qmd

Executive Summary

QMD (Quick Markdown Search) is an open-source, local-first search engine designed to bring state-of-the-art retrieval capabilities to personal knowledge bases without relying on cloud services. Hosted at tobi/qmd on GitHub, the project has rapidly gained traction with over 1,100 stars and 40+ contributors ¹. It distinguishes itself by combining three distinct search technologies—BM25 full-text search, vector embeddings, and LLM-based re-ranking—into a single CLI tool that runs entirely on-device ².

Designed for privacy-conscious users and developers building “agentic” workflows, QMD operates without network dependencies after initial model download. It integrates seamlessly with AI agents via a Model Context Protocol (MCP) server and structured JSON outputs ². However, its reliance on local GGUF models (requiring ~1GB of storage and moderate RAM) means users must balance retrieval quality with local resource constraints ².

1. Introduction – Why Local Markdown Search Matters

In an era where personal data often lives in the cloud, QMD offers a privacy-first alternative for managing local knowledge. It is built for users who maintain extensive markdown libraries—such as Obsidian vaults, engineering documentation, or meeting transcripts—and require retrieval quality comparable to enterprise search engines without sending data to third-party APIs ².

The tool is distributed as a Bun-based CLI, emphasizing modern JavaScript performance and ease of installation. By leveraging node-llama-cpp and sqlite-vec, QMD proves that sophisticated hybrid search pipelines can run efficiently on consumer hardware, making it a strong candidate for offline-first environments and secure internal documentation systems ²³.

2. Core Capabilities at a Glance

QMD unifies three layers of search technology into a single interface. Users can switch between modes depending on their need for speed versus precision.

2.1 Search Modes Comparison

The CLI exposes three primary commands, each activating a different part of the search pipeline:

Mode	Command	Technology Stack	Best Use Case
Keyword Search	`qmd search`	BM25 (Full-Text Search)	Instant lookups when you know exact terms (e.g., error codes, specific API names) ²⁴.
Vector Search	`qmd vsearch`	Dense Vector Embeddings	Semantic queries where keywords might not match (e.g., “how to login” finding “authentication flow”) ⁴.
Hybrid Query	`qmd query`	BM25 + Vector + Re-ranking	Complex research questions requiring high precision. Combines results and re-ranks them using a cross-encoder model ⁴.

The query mode represents the most advanced capability, utilizing a “Query Expansion” step to generate variations of the user’s prompt before executing the search, ensuring that even vaguely phrased queries retrieve relevant documents ².

3. Architecture & Data Flow

QMD’s architecture is built around a local SQLite database that serves as the central store for all index data.

3.1 Indexing Flow

When a user adds a collection, QMD processes files through a multi-stage pipeline:

Ingestion: Scans directories based on glob patterns (defaulting to markdown files).
Parsing: Extracts titles and content, generating a unique 6-character hash (docid) for each file.
Storage: Saves raw content and metadata into ~/.cache/qmd/index.sqlite.
Vectorization: Chunks documents into 800-token segments with a 15% overlap. These chunks are processed by the embeddinggemma-300M model to create dense vectors stored in sqlite-vec tables ².

3.2 Query Flow & Reranking

The hybrid search pipeline (qmd query) executes a sophisticated sequence of operations:

Query Expansion: The Qwen3-0.6B model generates variations of the user’s input to broaden the search scope.
Retrieval: The system performs parallel searches using both BM25 (keyword) and Vector (semantic) indices.
Re-ranking: Top candidates are passed to the qwen3-reranker-0.6b model. This “cross-encoder” scores each document’s relevance to the specific query on a 0.0–1.0 scale, filtering out low-quality matches before presenting the final list ².

4. Installation & Runtime Requirements

QMD is built on the Bun runtime, which must be installed prior to use.

4.1 System Prerequisites

Runtime: Bun >= 1.0.0 ³.
OS Support: macOS (requires Homebrew SQLite for extensions), Linux, and Windows (via WSL recommended) ².
Storage: ~1GB for model weights (downloaded automatically on first run) ².

4.2 Quick-Start

To get started, install the tool globally and initialize your first collection:

# 1. Install globally via Bun
bun install -g https://github.com/tobi/qmd

# 2. Add a collection (e.g., your notes folder)
qmd collection add ~/Documents/notes --name notes

# 3. Generate embeddings for semantic search
qmd embed

# 4. Run a hybrid search
qmd query "project roadmap 2025"

The qmd command is actually a Bash wrapper that intelligently locates the Bun executable and launches the TypeScript source, ensuring a smooth startup experience across different environments [^5].

5. Day-to-Day Usage

QMD is designed for heavy CLI users, offering robust commands for managing data and retrieving information.

5.1 Command Cheat Sheet

Task	Command	Description
Add Collection	`qmd collection add.`	Indexes the current directory as a collection ⁴.
Add Context	`qmd context add qmd://notes "Work stuff"`	Attaches descriptive metadata to improve search relevance ⁴.
Update Index	`qmd update`	Re-scans all collections for changes ⁴.
Get Document	`qmd get notes/file.md`	Retrieves specific file content, supporting fuzzy matching ⁴.
Multi-Get	`qmd multi-get "docs/*.md"`	Retrieves multiple files matching a pattern ⁴.

5.2 Context-Enriched Retrieval

A unique feature of QMD is “Context Management.” Users can explicitly tell the search engine what a collection contains. For example, tagging a folder with “Meeting transcripts and notes” helps the LLM understand that documents in that path are conversational records, improving the accuracy of semantic queries ⁴.

# Example: Adding context to a specific subfolder
cd ~/work/docs
qmd context add "Technical API documentation for v2"

6. Integration with AI Agents

QMD explicitly targets “agentic” workflows, where an AI assistant (like Claude or an OpenAI agent) needs to search local files to answer user questions.

6.1 Structured Output

All search commands support machine-readable formats. Agents can request JSON output to programmatically parse results, scores, and snippets.

# Get structured results for an LLM
qmd search "authentication" --json -n 10

6.2 MCP Server

QMD implements the Model Context Protocol (MCP), allowing it to act as a plug-and-play tool server for compatible AI clients (like Claude Desktop).

Exposed MCP Tools: ²

Tool Name	Function
`qmd_search`	Fast BM25 keyword search with collection filtering.
`qmd_vsearch`	Semantic vector search.
`qmd_query`	Full hybrid search with re-ranking.
`qmd_get`	Retrieve full document content by path or ID.
`qmd_status`	Check index health and collection info.

Configuring Claude Desktop to use QMD is as simple as adding the server config to claude_desktop_config.json:

{
 "mcpServers": {
 "qmd": {
 "command": "qmd",
 "args": ["mcp"]
 }
 }
}

7. Performance & Models

QMD relies on three specific GGUF models optimized for local execution. These are downloaded to ~/.cache/qmd/models/ upon first use ².

Model	Role	Size
`embeddinggemma-300M-Q8_0`	Generates vector embeddings for docs.	~300MB
`qwen3-reranker-0.6b-q8_0`	Re-ranks search results for relevance.	~640MB
`Qwen3-0.6B-Q8_0`	Expands user queries with variations.	~640MB

Resource Note: Generating embeddings (qmd embed) is a CPU-intensive process. The system chunks documents into 800-token segments. For large knowledge bases, the initial embedding process may take significant time, though subsequent updates are incremental ².

8. Bottom Line

QMD fills a critical gap for developers and power users who want the intelligence of RAG (Retrieval-Augmented Generation) without the privacy risks or latency of cloud APIs.

Adopt QMD if:

You have a large local markdown knowledge base (Obsidian, Dendron, etc.).
You need offline-capable semantic search.
You are building AI agents that need to “read” your local documentation.

Consider alternatives if:

You cannot install the Bun runtime.
You are on a severely resource-constrained device (e.g., Raspberry Pi Zero) where running 600MB+ models is not feasible.