Research infrastructure built by researchers, for researchers.

We've been in the trenches on the same problems — entity resolution at scale, cross-collection discovery, the tension between scholarly rigor and usability. Here's what we're building to make it generalizable.

The Problem in Detail

If you've worked with a large document collection computationally, you've encountered most of these:

Entity extraction and resolution across thousands of documents with inconsistent formatting, OCR errors, and ambiguous references
Cross-collection discovery — connecting an entity in one corpus to a related mention in a completely different collection held by a different institution
Balancing scholarly rigor with discoverability — building something that satisfies academic standards but is also actually usable and explorable
One-off builds — each project builds its own pipeline, its own schema, its own search tools. When the grant ends, the infrastructure dies with it

All of these projects are solving the same underlying infrastructure problems in isolation. There's no shared layer. That's what we're trying to change.

The Arke Network

Arke is the shared infrastructure layer. Instead of every project rebuilding from scratch, collections plug into a network where work compounds.

Knowledge Graph Extraction

Automated entity resolution and relationship extraction across large corpora. Entities resolved in one collection inform resolution everywhere else on the network.

Cross-Collection Discovery

Documents and entities are interconnected across institutional boundaries. A person mentioned in diplomatic cables surfaces alongside their appearance in a university archive.

Semantic Search & Retrieval

Vector search built for scholarly use. Your researchers and your audiences find what matters — not just what matches a keyword.

Provenance & Permanence

Content-addressed storage with cryptographic attestation. Every document, every entity, every relationship is versioned, verifiable, and permanent. Your data is restorable without us.

AI Research Tools

Agentic reasoning over your corpus with full provenance tracking. Every insight is traceable back to source documents — citable, auditable, reproducible.

Open Integration

Open API for connecting to existing library systems, digital repositories, and research workflows. No lock-in, no proprietary formats.

Who We Work With

Researchers

You have a collection and a research question. We provide the computational infrastructure to make it tractable.

Libraries & Archives

You hold collections that deserve better discovery and interconnection. We build the infrastructure to make them accessible.

Institutions

You have large-scale document collections that need better search, organization, and long-term preservation.

How We Work

We meet you where your project is. Some people come to us with a funded project and a clear technical need. Others are earlier — exploring what's computationally possible with a collection. We work with both, and we're happy to contribute to grant proposals when our infrastructure strengthens the application.

Scope

We help you figure out what's realistic for your collection, your timeline, and your research goals.

Build

We deliver the infrastructure — knowledge graphs, search, entity resolution, discovery tools — built on the Arke network.

Sustain

Your infrastructure outlives the funding cycle. Data is permanent, portable, and yours.