● LIVE   Breaking News & Analysis
153276 Stack
2026-05-02
Open Source

Dolt's Prolly Trees: A Breakthrough in Version-Controlled Databases

Dolt's open-source Prolly trees, a B-tree variant, bring Git-like version control to databases for the first time, enabling commit history, branching, and merging of entire datasets.

First-ever Database Versioning Powered by Prolly Trees Goes Open Source

Dolt, an Apache 2.0-licensed database project, has introduced a revolutionary variant of the classic B-tree data structure—dubbed "Prolly trees"—that enables full version control for entire databases. This innovation allows developers to track, branch, and merge database changes just as they do with source code in Git.

Dolt's Prolly Trees: A Breakthrough in Version-Controlled Databases

"Prolly trees combine the efficiency of B-trees with the immutability needed for versioning," said Brian Hendriks, lead developer at Dolt. "We’re essentially giving databases a commit history and reconciliation model that’s been missing for decades."

How Prolly Trees Work

Traditional databases and filesystems rely on B-trees to store sorted keys and values optimized for block storage. Prolly trees extend this by representing each node as a content-addressed, immutable snapshot. Inserting or updating a record creates a new tree root without invalidating previous versions.

The result is a database that can show exactly what it looked like at any point in time. Users can roll back, fork, or merge changes with the same commands used in Git—but applied to millions of rows of structured data.

Background: The Problem with Database Versioning

Version control has long been a pain point for data teams. While code repos have Git, databases have lacked a built-in way to diff schemas, track row-level history, or collaboratively edit datasets without locking. Existing tools like migration scripts or snapshot backups are fragile and slow.

B-trees were designed for performance on disk, not for versioning. Their mutable nature means every write directly alters the tree, overwriting previous states. Dolt’s Prolly trees solve this by making each operation return a new tree that shares most of its structure with the old one—a technique known as structural sharing.

"It’s like turning a B-tree into a persistent data structure," explained Dr. Emily Chen, a database systems researcher at Stanford. "The overhead is surprisingly low because only the changed nodes are re-created. The rest are reused via pointer references."

What This Means for the Industry

Dolt’s approach could fundamentally change how organizations manage data. Instead of relying on backup schedules or manual rollback scripts, teams can treat their database as a versioned artifact. This enables reproducibility in data science, safer schema migrations, and collaborative editing of datasets.

The open-source release under Apache 2.0 means other projects can adopt Prolly trees for their own versioning needs. Already, tools like DoltHub—a GitHub-like platform for databases—are demonstrating collaborative workflows on top of Dolt.

"Prolly trees make large-scale data versioning practical for the first time," Hendriks added. "We expect to see them embedded in everything from analytics platforms to operational databases within a few years."

Key Implications at a Glance

  • All databases become branchable—developers can experiment on a copy without risking production data.
  • Full lineage tracing—every row change is recorded with a commit hash, tying data to code deployments.
  • Efficient storage—only changed nodes are stored anew, keeping disk usage low even after thousands of commits.
  • Conflict resolution—merging two database branches works similarly to merging code, with automatic or manual conflict handling.

Looking Ahead

The Dolt team is already working on performance improvements for write-heavy workloads and integrations with major database drivers. Meanwhile, the broader open-source community is exploring Prolly trees for non-relational stores and blockchain applications.

"We’ve barely scratched the surface," Hendriks concluded. "This is the kind of foundational change that happens once a decade for databases."