Thinking about Filesystems, Databases, and Hypercores
Lately I’ve been thinking about how to represent data on the Hypercore protocol.
The protocol has a lot of interesting and useful primitives, and the challenge for applications is to combine them effectively — something we’re all still exploring. Depending on your application, you can end up with many tough requirements:
- A (network) database which performantly reads and writes structured data.
- A standard form which other applications can grok, since p2p applications have no central control of schemas.
- A data metaphor which users can understand and manage, since they’ll be the ones operating the apps.
- A data model which can import and export non-Hypercore data, such as POSIX filesystems.
I’ve had various projects (Beaker, CTZN) which tried to address some subset of these, and lately I’ve been wondering if they could all be solved. This post is some idle thoughts on a “database filesystem” that might do that.
Hypercore’s data structures
The Hypercore protocol shares datasets using p2p connections. The base data structure, the hypercore, is an append-only log¹ upon which you build data-structures.
Some of the more interesting work by the protocol team² has been on those data structures on top of hypercore logs. The target use-case is to access the datasets over the network, and so performance is very important. You want to minimize round-trips³. The solution they created is a kind of streaming embedded index, and there’s a great workshop available which explains exactly how that works.
The two streaming indexes the team has produced is Hypertrie (a streaming hash-trie) and Hyperbee (a streaming b-tree). Hypertrie was created first, and it’s what powers the Hyperdrive structure.
- Hyperdrive: a filesystem structure that uses streaming hash-tries (hypertrie).
- Hyperbee: a key-value database that uses streaming b-trees.
- Hypercore: a log database similar to e.g. Kafka.
What I learned from Beaker
With Beaker, I used Hyperdrive to store websites and user databases. I was aiming for an aggressive “everything is a file” approach. This worked fine with the websites, but eventually hit trouble with the databases.
Some of the “filesytem as database” model worked wonderfully. You would create a folder-structure with conventional semantics like this:
/feed/*.md — Microblog posts
/comments/*.md — Comments
/follows/*.goto — Followed users
/profile.json — Profile information
To produce a feed, you would run a query against all the followed users’ hyperdrives using a globbing pattern of
/feed/*.md then assemble them by their ctime. It was really neat to interact with your data this way and create a new post by writing a markdown file to a folder, but that’s roughly where the “neatness” stopped.
The first problem was that Hyperdrive’s hash-trie structure doesn’t preserve filename order. To get the five latest posts, you’d have to read the folder’s full listing into memory and then sort. This is a scaling problem.
The second problem was the weak structuring of the data. Hyperdrive supports key-value metadata on files which I (ab)used to add queryable data; for instance, the comment .md files would include
parent metafields which stored the URLs that the comment referenced. This worked decently well, but it broke interop with external filesystems: if you exported your hyperdrive to your native OS, that metadata was lost⁴. The alternative could be to use JSON files, but reading and parsing the file content involves additional network trips and would hurt performance too much.
The third problem was DX (Developer Experience). As neat as “filesystem as a database” may seem, it ended up being neither a filesystem nor a database. Devs often told me they were uncomfortable reading and writing files as records, which is possibly a learned intuition from normal filesystems being a poor fit as databases, but Beaker’s Hyperdrives lacked the kind of tooling a DB needs⁵.
The fourth and final problem was UX. Filesystems should aim to be user-friendly; in fact, my largest criticism of PCs is that they made the filesystems scary to users by storing system config without finding a way to explain the files. Users learned to stay away from the “scary files” lest they brick their machines, and those scary files became the domain of the OS and userland programs⁶. In Beaker, that user-unfriendliness manifested as well, for example as a comments folder filled with numeric-named files, which means they might as well have not been exposed to the user at all.
My takeaway from this was that you can’t turn a filesystem into a database by just using it that way. Unless an application is operating on files (e.g. a text editor or paint program) it really needs to use a database. With users, I’d go even further: the filesystem is UI. If it’s not helping them manage data or understand their system, it’s not doing its job. If you have non-file records, you’d serve a user better by putting them in a nicely-named
.db file and using a database-like interface for exploring that file’s contents.
What I learned from CTZN
With CTZN, I decided to use Hyperbee and think in terms of databases. Unlike Hyperdrive, Hyperbee supports ordered range-reads of keys over the network, and its values can be structured data (e.g. JSON). Its DX is akin to LevelDB: not the easiest database in the world, but much more natural than a filesystem.
I spent a lot of CTZN’s development thinking about interop and specifically worked on sharing semantics between applications. The approach I used was strictly-enforced, machine-readable schemas identified by URLs. Tables were identified by their schemas’ URLs (binding them to their schemas) and validation was applied on reads and writes. I’ll write more about this in the future, but the rigidity of the system is informed by my past experiences (in SSB and Beaker) where schema ambiguity left everyone frustrated.
This time around, I didn’t have any significant complaints about the data-model. CTZN presented itself as a Twitter clone, so users weren’t expected to manage their data directly — they just used the app — so user clarity wasn’t a big focus. That said, with the strict schemas it’s easy to imagine how a “data explorer” could render the tables’ records with hints from the schemas, or with baked in rendering for known schemas, and even using plugins for rendering new schemas. This wouldn’t be particularly different from how MacOS handles files with Uniform Type Identifiers (UTIs).
Performance was relatively strong⁷. In a model similar to CouchDB, CTZN would produce views by tailing changes and writing secondary indexes which were extracted from the records. The indexers were a little more laborious to write than indexes in a SQL database, but they were very flexible and easy to get right.
The area that needed the most improvement was how “blobs” were handled. (Blobs are binary data which were attached to records, e.g. a profile photo for a user profile.) In a manner similar to Hyperdrive, CTZN created a separate hypercore for storing binary chunks and then wrote pointer-records in the main bee. This worked perfectly in all cases except deletes; with no clear binding between a record and a blob, the CTZN server never knew when it was safe to delete a blob from its cache, leading the blobs to accumulate indefinitely. Any future iteration needs to be mindful of this.
My chief takeaway here was that databases on Hyperbee is the right call for applications and a CouchDB-like with global schemas is a feasible approach for my next project. If we can standardize some of that code, it might be a nice solution for building interoperable Hyper apps.
Looking again at database filesystems
If we were to build an operating system on CTZN’s databases, what might its “Finder” app look like? There would probably be a flat listing of the databases, and then each database would have a flat list of tables, and each table a flat list of records. With the well-defined schemas, this information could all be presented nicely to the user. Double clicking a record could open the appropriate application, and context menus could give a useful set of options.
In some ways, this hypothetical “Finder” isn’t all bad. Each database gives a clear sense of location — with read and write access specified per each database — and the user-friendly renderings could locate data quickly. There are often times that users don’t want or need to think about where the data is stored; I’m happy to dump pictures into a “photo stream” and then add categorizations like albums or tags after. Throw in a search UI and you have a pretty decent toolset⁸.
Still, I think I would find this OS odd. Files are a key piece of computing, no matter what Apple tells you. How would you represent a code repository in these databases? I imagine you would flatten the file tree into a list of code files and represent them using their original paths, which is noisy and useless when viewed from our Finder.
There are two things I’d say about files compared to databases: 1, files need hierarchical organization, and 2, files ain’t structured data. You can force files into database tables, but what’s the point? You’re only going to extract a few bits of structured metadata into records, and then the rest will live in a binary blob. You’re better off treating files as files and db records as db records.
If I wanted to make a database + filesystem hybrid, I think the obvious answer is to allow both to exist as their own entities. The DBFS would have a table space and a file space and treat them differently. The files are navigated and accessed individually; the records are queried and aggregated. You use the one you need.
That said, the DBFS could possess some interesting features. One can imagine files which are links to specific records, like a shortcut file. One can also imagine virtual folders which are actually queries of the database, with the resulting “files” being shortcuts as well. This might be inviting confusion again, but I think users could wrap their heads around “shortcut files.”
It’s fun to wonder how useful this DBFS Finder could be. GitHub has managed to turn code repositories into passable websites by dint of markdown files and inline READMEs; could a DBFS do the same? What if there were the equivalent of .exe files in our DBFS? What if there were contextual commands like Plan9’s plumber?
These are idle thoughts at the moment; I’m not sure yet whether I’m going to pursue a DBFS. The database model that started with CTZN may be enough for my next project. Still, it’s interesting to think about.
¹ Hypercore uses a merkle tree to ensure the append-only constraint, but you can delete old “blocks” so long as you keep enough tree hashes. A future release is going to add history truncation as well.
³ Minimizing roundtrips to the network is roughly the same challenge as minimizing reads from the disk, except that your penalty is much higher. After getting the RTs down as much as possible, the protocol team added “hints” to the wire which explains what data is being requested so that the hoster can optimistically send blocks.
⁴ I don’t think you can ignore issues like this. People expect files to be files regardless of the platform they’re on, and if they can’t transfer platforms then they’re not really files anymore.
⁵ One reason we never treat folders as databases is that files are written in chunks with a minimum size (the “cluster size”) which is typically 4kb, so your 200 byte record is wasting ~3.8kb of space. Hyperdrives pack their data into a file (as do all hypercore structures) so they don’t suffer from this problem, but the lack of other database facilities still makes Hyperdrive a bit weak as a DB.
⁶ I’m sympathetic to the idea that using the filesystem for the device’s state can be good because it unifies everything under a shared namespace. In practice, modern operating systems scatter the config between a lot of hard-to-find locations and in other namespaces (Windows’ registry) which makes this idea moot in practice. If you can’t uninstall a program by deleting its folder then the filesystem is not a user-friendly approach to config either.
⁷ Barring one big mistake on my part which brought the entire server to a crawl, an event which became known as “CTZNpacolypse”. We minted NFT-shirts.
⁸ The config and state of your system would be placed in their own databases, with perhaps one database per program.