2025

September 10, 2025
in Evaluation, Information Extraction
1 min read

Citations as Deterministic Anchors

Trust in AI is built on transparency, not just accuracy scores. In our extraction pipelines, we treat every data point as a claim that must be proven. By implementing a system that requires a citation for every one of the several dozens fields we track per topic, we transform a probabilistic AI output into a deterministic, auditable record. This eliminates the "black box" problem and gives users the ability to jump directly to the source of any data point.

July 20, 2025
in AI Engineering, Information Extraction
1 min read

Precision is Table Stakes; Recall is the Frontier

In the discussion around Large Language Models, the fear of hallucinations — incorrect information — often dominates the conversation. Achieving 100% precision is a prerequisite for any financial data system; "no garbage in" is a non-negotiable rule. However, for professional-grade extraction with LLMs, the more difficult challenge is recall. If an LLM encounters a complex website structure or a 50-page legal document, it often loses focus, missing critical details buried in the text.

June 15, 2025
in Data Engineering, AI Engineering, Information Extraction
1 min read

The Decay of Manual Excellence

High-quality data is often a point of pride for any company collecting them. In a recent engagement, the client had built a remarkably reliable database through a rigorous, labor-intensive process where mid-level staff reviewed every entry made by junior analysts. However, even the most meticulous manual process eventually hits a wall: the velocity of information. In the space of e.g. investments, data ages rapidly, and a manual team simply cannot scale their output to keep pace with the market without a linear — and often unsustainable — increase in headcount.