QMD – Quick Markdown Search: A Technical Deep-Dive into the Open-Source Local Search Engine
Author’s note: How does it work? https://github.com/tobi/qmd
Executive Summary
QMD (Quick Markdown Search) is an open-source, local-first search engine designed to bring state-of-the-art retrieval capabilities to personal knowledge bases without relying on cloud services. Hosted at tobi/qmd on GitHub, the project has rapidly gained traction with over 1,100 stars and 40+ contributors 1. It distinguishes itself by combining three distinct search technologies—BM25 full-text search, vector embeddings, and LLM-based re-ranking—into a single CLI tool that runs entirely on-device 2.
Designed for privacy-conscious users and developers building “agentic” workflows, QMD operates without network dependencies after initial model download. It integrates seamlessly with AI agents via a Model Context Protocol (MCP) server and structured JSON outputs 2. However, its reliance on local GGUF models (requiring ~1GB of storage and moderate RAM) means users must balance retrieval quality with local resource constraints 2.
1. Introduction – Why Local Markdown Search Matters
In an era where personal data often lives in the cloud, QMD offers a privacy-first alternative for managing local knowledge. It is built for users who maintain extensive markdown libraries—such as Obsidian vaults, engineering documentation, or meeting transcripts—and require retrieval quality comparable to enterprise search engines without sending data to third-party APIs 2.
The tool is distributed as a Bun-based CLI, emphasizing modern JavaScript performance and ease of installation. By leveraging node-llama-cpp and sqlite-vec, QMD proves that sophisticated hybrid search pipelines can run efficiently on consumer hardware, making it a strong candidate for offline-first environments and secure internal documentation systems 23.
2. Core Capabilities at a Glance
QMD unifies three layers of search technology into a single interface. Users can switch between modes depending on their need for speed versus precision.
2.1 Search Modes Comparison
The CLI exposes three primary commands, each activating a different part of the search pipeline:
| Mode | Command | Technology Stack | Best Use Case |
|---|---|---|---|
| Keyword Search | qmd search | BM25 (Full-Text Search) | Instant lookups when you know exact terms (e.g., error codes, specific API names) 24. |
| Vector Search | qmd vsearch | Dense Vector Embeddings | Semantic queries where keywords might not match (e.g., “how to login” finding “authentication flow”) 4. |
| Hybrid Query | qmd query | BM25 + Vector + Re-ranking | Complex research questions requiring high precision. Combines results and re-ranks them using a cross-encoder model 4. |
The query mode represents the most advanced capability, utilizing a “Query Expansion” step to generate variations of the user’s prompt before executing the search, ensuring that even vaguely phrased queries retrieve relevant documents 2.
3. Architecture & Data Flow
QMD’s architecture is built around a local SQLite database that serves as the central store for all index data.
3.1 Indexing Flow
When a user adds a collection, QMD processes files through a multi-stage pipeline:
- Ingestion: Scans directories based on glob patterns (defaulting to markdown files).
- Parsing: Extracts titles and content, generating a unique 6-character hash (
docid) for each file. - Storage: Saves raw content and metadata into
~/.cache/qmd/index.sqlite. - Vectorization: Chunks documents into 800-token segments with a 15% overlap. These chunks are processed by the
embeddinggemma-300Mmodel to create dense vectors stored insqlite-vectables 2.
3.2 Query Flow & Reranking
The hybrid search pipeline (qmd query) executes a sophisticated sequence of operations:
- Query Expansion: The
Qwen3-0.6Bmodel generates variations of the user’s input to broaden the search scope. - Retrieval: The system performs parallel searches using both BM25 (keyword) and Vector (semantic) indices.
- Re-ranking: Top candidates are passed to the
qwen3-reranker-0.6bmodel. This “cross-encoder” scores each document’s relevance to the specific query on a 0.0–1.0 scale, filtering out low-quality matches before presenting the final list 2.
4. Installation & Runtime Requirements
QMD is built on the Bun runtime, which must be installed prior to use.
4.1 System Prerequisites
- Runtime: Bun >= 1.0.0 3.
- OS Support: macOS (requires Homebrew SQLite for extensions), Linux, and Windows (via WSL recommended) 2.
- Storage: ~1GB for model weights (downloaded automatically on first run) 2.
4.2 Quick-Start
To get started, install the tool globally and initialize your first collection:
# 1. Install globally via Bunbun install -g https://github.com/tobi/qmd
# 2. Add a collection (e.g., your notes folder)qmd collection add ~/Documents/notes --name notes
# 3. Generate embeddings for semantic searchqmd embed
# 4. Run a hybrid searchqmd query "project roadmap 2025"The qmd command is actually a Bash wrapper that intelligently locates the Bun executable and launches the TypeScript source, ensuring a smooth startup experience across different environments [^5].
5. Day-to-Day Usage
QMD is designed for heavy CLI users, offering robust commands for managing data and retrieving information.
5.1 Command Cheat Sheet
| Task | Command | Description |
|---|---|---|
| Add Collection | qmd collection add. | Indexes the current directory as a collection 4. |
| Add Context | qmd context add qmd://notes "Work stuff" | Attaches descriptive metadata to improve search relevance 4. |
| Update Index | qmd update | Re-scans all collections for changes 4. |
| Get Document | qmd get notes/file.md | Retrieves specific file content, supporting fuzzy matching 4. |
| Multi-Get | qmd multi-get "docs/*.md" | Retrieves multiple files matching a pattern 4. |
5.2 Context-Enriched Retrieval
A unique feature of QMD is “Context Management.” Users can explicitly tell the search engine what a collection contains. For example, tagging a folder with “Meeting transcripts and notes” helps the LLM understand that documents in that path are conversational records, improving the accuracy of semantic queries 4.
# Example: Adding context to a specific subfoldercd ~/work/docsqmd context add "Technical API documentation for v2"6. Integration with AI Agents
QMD explicitly targets “agentic” workflows, where an AI assistant (like Claude or an OpenAI agent) needs to search local files to answer user questions.
6.1 Structured Output
All search commands support machine-readable formats. Agents can request JSON output to programmatically parse results, scores, and snippets.
# Get structured results for an LLMqmd search "authentication" --json -n 106.2 MCP Server
QMD implements the Model Context Protocol (MCP), allowing it to act as a plug-and-play tool server for compatible AI clients (like Claude Desktop).
Exposed MCP Tools: 2
| Tool Name | Function |
|---|---|
qmd_search | Fast BM25 keyword search with collection filtering. |
qmd_vsearch | Semantic vector search. |
qmd_query | Full hybrid search with re-ranking. |
qmd_get | Retrieve full document content by path or ID. |
qmd_status | Check index health and collection info. |
Configuring Claude Desktop to use QMD is as simple as adding the server config to claude_desktop_config.json:
{ "mcpServers": { "qmd": { "command": "qmd", "args": ["mcp"] } }}7. Performance & Models
QMD relies on three specific GGUF models optimized for local execution. These are downloaded to ~/.cache/qmd/models/ upon first use 2.
| Model | Role | Size |
|---|---|---|
embeddinggemma-300M-Q8_0 | Generates vector embeddings for docs. | ~300MB |
qwen3-reranker-0.6b-q8_0 | Re-ranks search results for relevance. | ~640MB |
Qwen3-0.6B-Q8_0 | Expands user queries with variations. | ~640MB |
Resource Note: Generating embeddings (qmd embed) is a CPU-intensive process. The system chunks documents into 800-token segments. For large knowledge bases, the initial embedding process may take significant time, though subsequent updates are incremental 2.
8. Bottom Line
QMD fills a critical gap for developers and power users who want the intelligence of RAG (Retrieval-Augmented Generation) without the privacy risks or latency of cloud APIs.
Adopt QMD if:
- You have a large local markdown knowledge base (Obsidian, Dendron, etc.).
- You need offline-capable semantic search.
- You are building AI agents that need to “read” your local documentation.
Consider alternatives if:
- You cannot install the Bun runtime.
- You are on a severely resource-constrained device (e.g., Raspberry Pi Zero) where running 600MB+ models is not feasible.