Skip to content

Engineering · 2025–2026

Second-brain research engine

Most literature tools either search or summarize; this one operates a growing corpus as living infrastructure. The motivation is a "second brain" that ingests, deduplicates, annotates, and synthesizes biomedical research continuously, with as little human babysitting as possible. The pipeline that turns a paper into a reading note is deliberately frozen: its configuration (prompt version, drafter model, token budget, temperature, timeout, retry behavior) is the distilled winner of a sibling benchmarking project and is treated as an immutable black box, version-stamped onto every note so provenance is auditable.

Data flows in five stages. Snowball search gathers candidate identifiers from PubMed Central (primary), bioRxiv/medRxiv and arXiv channels, and Semantic Scholar's citing and cited graph, excluding venues on a predatory list. A first dedup layer hard-skips any paper whose DOI, PMID, or arXiv ID is already in the store; a second layer compares abstract embeddings with author and year overlap to catch near-duplicates, with explicit rules for promoting a journal version over its preprint. Surviving papers are fetched, run through the note pipeline, and posted to the store as Paper and Note nodes connected by typed edges. Embeddings are generated locally so the corpus supports semantic search and graph traversal from one backend.

What is distinctive is the synthesis cycle. Every batch of newly stored papers, an Opus-driven step pulls a thematic cluster from the store, drafts a synthesis (summary, knowledge-gap map, research question, or scoping review), then re-verifies each citation independently through the PubMed and bioRxiv interfaces. On a mismatch it edits the source note in place and records a correction audit trail, rather than silently trusting the draft. Quality problems it cannot self-correct trigger a halt notification and pause the loop until a human intervenes — a "halt, don't fudge" stance. The repository also grew beyond papers: it ingests health newsletters and news items as searchable concept nodes, drives a mindmap for live planning, and uses browser automation for sources that need it.

Honest context: this is an operational system governed by hard invariants (English-only notes, a single-GPU budget, mandatory MeSH tagging, the frozen pipeline, no public exposure) rather than a published benchmark, so its value is in workflow design and durability, not headline metrics. The note quality is bounded by the local model behind the frozen pipeline; benchmarking and prompt experimentation live in a separate sibling repository by design. It depends on the shared graph store and several external scholarly APIs, so the availability and rate limits of those services shape throughput.

← Back to all work