What Is Ceph?

Modern data demands storage that scales without breaking. Ceph meets this challenge by unifying object, block, and file storage in a single open-source platform. Built on a distributed architecture, it distributes data intelligently across clusters of servers to remove single points of failure. The result is storage that heals itself when parts of the system go down. It can also expand smoothly from small environments to vast, exabyte-scale deployments.

Why use Ceph?

Ceph’s distributed design removes a single point of failure. Data is replicated (or erasure-coded) across devices. If one server goes offline, the data is still accessible on one of the other storage servers in the Ceph cluster. Ceph distributes the data using an algorithm called ‘CRUSH’ (Controlled Replication Under Scalable Hashing).

A CRUSH map allows an administrator to customise how their data is placed across hardware. Let’s say an admin has 20 servers in a rack, 10 racks in a row, and 8 rows in their data centre. If they’re worried about a single server in one rack losing power, they can use Ceph’s CRUSH map to place replicas across other servers in that rack. Or, to combat a whole rack going offline, data could be distributed across other racks in that row. Or, for complete redundancy, data could be replicated across all the rows in the data centre. The CRUSH map lets the admin tune the layout, and then Ceph ensures consistent behaviour across the storage servers.

How Ceph works

Ceph operates as a distributed storage platform built on its RADOS object store. Instead of routing all requests through a central controller, clients calculate exactly where to read or write data using the CRUSH algorithm. This design removes bottlenecks. It lets the system scale from a handful of nodes to thousands, while still providing consistent performance.

CRUSH also brings flexibility in how data is placed. Administrators can define rules that balance performance and resilience – for example, ensuring copies of an object are spread across racks, rows, or even data centres. Placement is computed rather than looked up in a central table. This allows the cluster to avoid metadata overhead and to quickly recover if parts of the system fail.

How is Ceph different?

In traditional data storage methods, data is located through its metadata (the data about the data). The metadata tells us where we have to look to find individual pieces of data, e.g. row 3, rack 2, server 8. The problem with this is that metadata also has to be stored somewhere. And in large-scale operations where data is stored by the exabyte (eighteen zeroes), this metadata adds up to a significant proportion of space.

Not only can a metadata-heavy approach increase storage overhead and risk, it comes with a fairly fatal flaw – if the metadata is lost or inaccessible, you may have no practical way of finding your data. It would be like trying to find your car when it's pitch black, in an infinite car park where all of the cars look the same and you have no idea which way is north. A metadata-type approach would read the number plates of all the cars to find yours – but if you lose the list, you lose your car.

Ceph storage avoids this problem by computing the location of data on demand, rather than relying on previously stored metadata for placement. To access a specific piece of data, Ceph figures out where it lives in the storage cluster using the CRUSH algorithm. This avoids a central lookup service becoming a bottleneck or single point of failure and enables the system to scale. In our car park metaphor, if you know a few facts about the car you’re looking for, CRUSH can work out where to look – even without the number plate list.

Ceph also supports traditional metadata where needed (for example, with its POSIX file system layer), so you can pick the method that fits the service you’re running.

Core Ceph components

Several components work together to make a Ceph cluster function:

MON (Monitors)

Monitor daemons maintain maps of the cluster state, including OSD locations, CRUSH rules, and authentication information. They form a quorum to agree on cluster health. So, a typical production setup runs at least three MONs for reliability.

MGR (Manager)

The Manager daemon provides cluster-wide services such as monitoring, orchestration modules, and the Ceph Dashboard. It works alongside the MONs to expose operational insights and API hooks.

OSD (Object Storage Daemon)

Each OSD stores actual data on disk or SSD. It handles replication or erasure coding, recovery, rebalancing, and backfilling when nodes fail or new ones are added. Since a cluster can contain hundreds or thousands of OSDs, they form the backbone of Ceph’s scale-out design.

MDS (Metadata Server)

The Metadata Server is only required when running CephFS. It manages namespace operations, like tracking directories, paths, and permissions. Clients can interact with CephFS as if it were a traditional file system.

RGW (RADOS Gateway)

The RADOS Gateway provides RESTful APIs compatible with Amazon S3 and OpenStack Swift. It allows developers to treat Ceph as an open-source object storage platform for cloud-native applications.

Ceph services at a glance

Ceph is unusual because a single cluster can serve multiple storage types at once. Instead of running separate systems for object, block, and file, you can use the same pool of hardware and scale it in whichever direction your workloads demand.

Object storage

Ceph’s RADOS Gateway provides RESTful APIs that are compatible with Amazon S3 and OpenStack Swift. This means applications can interact with Ceph as if it were a public cloud object store, but with the flexibility of running it on your own hardware. Object storage is particularly suited to backups and archives. It's also great for large-scale unstructured data such as logs, images, or media libraries. Many organisations use Ceph’s object store as the foundation for private cloud or data lake deployments.

Block storage

With the RADOS Block Device (RBD), Ceph delivers thin-provisioned block volumes that look and behave like traditional disks. These volumes can be attached to virtual machines, containers, or bare-metal hosts. They support advanced features such as snapshots and cloning. This makes them a strong fit for virtualisation platforms, databases, and other performance-sensitive workloads. Because block storage is provided directly from the cluster, it benefits from the same scalability and redundancy as Ceph’s other services.

CephFS (file system)

Ceph also includes CephFS, a fully POSIX-compliant file system built on top of the object store. File data is stored as objects, while Metadata Servers (MDS) handle directory structures, permissions, and path lookups. This separation allows CephFS to scale horizontally. Adding more MDS servers increases throughput and supports larger numbers of concurrent clients.

CephFS is ideal for workloads that require shared file access, such as research computing or collaborative environments. It provides the familiar semantics of a networked file system while retaining the scale and resilience of a distributed object store underneath.

Ceph and Kubernetes

Ceph is often deployed alongside Kubernetes to provide persistent storage for containerised workloads. The most common way to achieve this is through Rook, a Kubernetes Operator that automates the deployment and management of Ceph clusters. Rook translates Kubernetes storage classes into Ceph-backed volumes, taking care of lifecycle tasks like provisioning, monitoring, and upgrades.

For DevOps teams, this integration means applications running in Kubernetes can consume storage without needing to know anything about the underlying cluster design. Bridging cloud-native orchestration with enterprise-grade storage.

What does ‘Ceph’ mean?

‘Ceph’ looks like it should be an acronym, but it isn’t. The name comes from a shortening of ‘Cephalopod’ (which is why the Ceph logo looks like an octopus). Octopuses are known for self-healing and adaptability – a neat metaphor for Ceph’s ability to recover and rebalance when parts of a cluster go down.

Ceph versions

Each major Ceph release is named after a cephalopod – from Bobtail and Cuttlefish through to current names like Reef and Squid. These releases are published roughly once a year, with long-term support and regular backport updates during their lifecycle.

At any given time, two stable branches are maintained in parallel. You have the most recent release and its predecessor. Production users can adopt new features at their own pace while still receiving security and stability updates.

Where Ceph fits for you

If you need Ceph storage for file, block and object in one platform, Ceph’s open-source, scale-out design makes it a strong candidate. Especially when you want to run on commodity hardware and grow over time. It's also useful if you’re comparing distributed storage options for Kubernetes, virtualisation or analytics. And Ubuntu’s overview captures the enterprise use cases well if you want a second opinion.

Looking for reliable hosting while you explore storage options like Ceph? Our Web Hosting and server line-up come with UK data centres and 24/7 support – so you can focus on the services you’re building. If you need help picking a plan, our team’s here to help.