You might be asking yourself "what does Ceph mean?" Well, Ceph is an open-source software that provides a scalable and reliable clustered storage solution. In a clustered storage architecture multiple storage servers work together, and the workload is distributed between each one. This improves performance and reliability.

Why use Ceph?

Ceph’s distributed design makes it completely fault-tolerant and as such there is no single point of failure. Data is replicated between servers so that if one server goes offline, the data is still accessible on one of the other storage servers in the cluster. Ceph storage distributes the data across the cluster using an algorithm called ‘CRUSH’ (Controlled Replication Under Scalable Hashing).

A CRUSH map allows an administrator to customise how their data is replicated among their servers. Let’s say an admin has 20 servers in a rack, 10 racks in a row, and 8 rows in their data centre. If they’re worried about a single server in one rack losing power, they can use Ceph’s CRUSH map to replicate that data across other servers in that rack. Or, to combat a whole rack going offline, data could be distributed across others racks in that row. Or, for complete redundancy, data could be replicated across all the rows in the data centre. The CRUSH map allows the admin to distribute the data however they want, and then Ceph ensures consistent performance between the storage servers.

How is Ceph different?

In traditional data storage methods, data is located through its metadata (the data about the data). The metadata tells us where we have to look to find individual pieces of data, e.g. row 3, rack 2, server 8. The problem with this is that metadata also has to be stored somewhere. And in large-scale operations where data is stored by the exabyte (eighteen zeroes), this metadata adds up to a significant proportion of space.

Not only does this traditional method increase the amount of storage space required, reducing performance and reliability, it comes with a fairly fatal flaw. If for whatever reason the metadata is lost or inaccessible, then you have no way of knowing where any of your data is. It would be like trying to find your car when it's pitch black, in an infinite car park where all of the cars look the same and you have no idea which way is north. It's safe to say you’d get lost quickly. A metadata-type approach to finding your car would be to input an ID, like a number plate. Then it would read the number plate of all of the cars in the car park, find the one that belongs to you and beam a spotlight on to it. But if you lose the metadata, you lose your car.

Ceph storage avoids this problem by computing the location of data on demand, rather than relying on previously stored metadata. To access a specific piece of data, Ceph figures out where the data is in the storage cluster directly through the CRUSH algorithm. This eliminates the dangers and drawbacks of having to store metadata. Plus this computation is performed software-side, not server-side, so it has no effect on the performance of the storage servers. In the carpark metaphor, if you know how old your car is and that it's got 5 seats, air-conditioning and a tow-bar, then the CRUSH algorithm can figure out which one your car. And this can all be done even if you can’t remember your number plate, you just need to know the details of the car you're looking for.

Ceph gives the option for both the traditional metadata-storage method, and the CRUSH algorithm method. It just all depends on which method is best suited to the storage distribution.

What does ‘Ceph’ mean?

‘Ceph’ looks like it should be an acronym, but it isn’t. The name comes from a shortening of ‘Cephalopod’ (which is why the Ceph logo looks like an octopus). In fact, alongside conventional numbering systems, each version release of Ceph storage is named alphabetically after a species of squid (Andromeda, Bobtail, Cuttlefish, and so on). As of April 2017, the latest version of Ceph is ‘Kraken’ (or 11.2.0).

As well as their three hearts, octopuses are known for their ability to regenerate limbs. That’s a basic metaphor for Ceph storage (and probably why they called it ‘Ceph’). When one ‘arm’ goes down, Ceph heals itself and starts to regenerate the ‘arm’ by restoring the lost data from its backups that are spread across the other 'arms'. This self-healing is firstly advantageous because it ensures no data is lost. Secondly, it's great because it does it all without the need for interaction from the administrator, which leaves them more time for productive tasks.

At Fasthosts, we use Ceph storage software to help power our Web Hosting. Want to know more about which Web Hosting package is best for you? Get in touch with our Sales team today. If you need help with an existing product, our support team are here to help 24/7.