Internals of a distributed cache system

In this new entry of the blog we will explain how a distributed cache system
works. Before I started working on this post my original idea was to explain only one
caching algorithm (the idea appear while I was reading groupchache source code
to understand the hashing algorithm. By the way, is an amazing piece of software,
you should check it)
so I started implementing it and when I was done I realize that
would be nice to test in some way, so why not implement a simple and basic
distributed cache system in a way that is easy to explain? This entry will be
splitted as independent entries:

Here we go ;)

DISCLAIMER

  • I'm not an expert on caching systems
  • I'm a python fanatic
  • The final cache system is not production ready nor finished :)

What is a Cache system?

At a glance a cache system is a basic piece of software, is not rocket
sience, afterall is as simple as having a piece of memory for storing data
identified by some key.

Well... sounds simple but if we start adding features it becomes more complex,
so we need to know when to stop adding fancy stuff. Our system at least will have this requirements:

  • Modular: Yeah, we need the ability to change the algorithms and stuff as we want without to much trouble
  • Cache: We want to store things... (Thanks captain obvious :P)
  • Distributed: We need to add and remove nodes as the data grows...
  • The least data loss: We should lose the minimun data when removing nodes from the system
  • Dynamic: We need to add nodes and remove without the need to restart all the nodes (cluster)
  • Service discovery: Automatic discovery of new nodes when a adding a new one

Our design

The system will be the group of these pieces:

As I said before this cache system is not for production or use. It has been designed only with
educationl pourposes in mind, it has been designed omitting
error handling, data replication, optimization... there are a lot of amazing cache systems out there
like Memcached or Groupchache (both from the same person bradfitz)

This will be the schema of the cache system when finished, looks exciting, doesn't it?

Cache schema

Lets start messing around with the cache algorithm! part 2