Tara Vancil is a programmer and decentralized Web enthusiast in Austin, TX.
Earlier this month, I spoke at bangbangcon about how Merkle trees are the backbone of nearly all decentralized technologies. Here is a summary of my presentation:
The Web is centralized, but why?
There are two powerful centralizing forces that affect how the Web works today:
- The server problem
- Host-based addressing
We’ll address them in turn.
1. The server problem
If you want to host files on the Web, you need a server. But running your own server is both expensive and burdensome, so most people don’t run their own servers. They choose the more practical option: uploading content to a service dedicated to providing reliable and affordable hosting.
Sure, these services are cheap and convenient, but since the barrier to entry for self-hosting content is so high, a lot of content ends up being concentrated on the infrastructure of a handful of hosting providers.
2. Host-based addressing
The Web currently uses a host-based addressing model to provide unique names for pieces on content, and it’s imperfect at best. Consider a situation where I’ve uploaded a video to YouTube. The URL for my video is tightly bound to the
youtube.com origin. And if I ever decide to move my video to another hosting provider like Vimeo, I’ll need to get an entirely new URL.
Because addresses for content are bound to where they’re hosted, there’s a lot of friction involed in exercising choice between hosting providers. So in the above example, even if I decide Vimeo is a superior hosting platform, I may choose to continue hosting my video on YouTube, simply because the burden of distributing a new URL is too high.
Content addressing: a better way to address content
There’s another way to address content that doesn’t bind files the servers where they’re hosted — content addressing.
Content addressing is the process of generating a unique address for a piece of content based on its value, rather than its location. We can do this with a hash function.
Hash functions are one-way functions that take an input (like a file), and generate a fixed-length output. So even even if your input is a 3GB file, the result will be compressed to the hash functions output length (often 32 bytes).
It’s important to note that a well-designed hash function guarantees with extremely high probability that no two inputs will ever generate the same output. So this means that a hash digest of a piece of content acts as a unique identifier for that piece of content.
So imagine what it would be like if we used content-addressing instead of host-based addressing on the Web. Now if I need to move a video from one hosting provider to another, it’s not such a big deal, because the link will remain the same no matter where it’s hosted.