Advanced Ceph Capacity Planner
Raw Capacity (Marketing TB):0 TB
Raw Capacity (Ceph TiB):0 TiB
Usable Capacity (Net):0 TiB
Safe Capacity (80% + N-1 Reserve):0 TiB
Efficiency:0 %
Explanation of Settings
To ensure your Ceph cluster planning is successful, it is important to understand the technical parameters used in this calculator:
- Unit Basis (TB vs. TiB): Hard drive manufacturers use decimal Terabytes (Base 1000). However, Ceph and operating systems calculate in binary Tebibytes (Base 1024). A “12 TB” drive provides only about 10.9 TiB of raw capacity in Ceph.
- Redundancy Mode: * Replication: Creates full copies of your data (Standard: 3x). Highly performant and simple, but expensive (33% efficiency).
- Erasure Coding (EC): Breaks data into chunks ($k$) and adds parity chunks ($m$). Much higher efficiency (often > 60%), but requires more CPU resources.
- N-1 Reserve: In a healthy cluster, there must always be enough free space to allow the data from the largest node to be redistributed to the remaining servers in the event of a failure (Self-Healing).
- Safe Fill Ratio (80%): Ceph should never be filled to 100%. At 85% (“Nearfull”), performance drops drastically. At 95% (“Full”), Ceph stops all write operations to prevent data corruption.
What is Ceph?
Ceph is an open-source software-defined storage solution designed to run on standard hardware (commodity servers). Instead of buying a single, expensive storage filer, Ceph connects many individual servers (nodes) into one massive storage pool.
The core advantages of Ceph are:
- No Single Point of Failure: If a server or a disk fails, the data remains available.
- Unlimited Scalability: You can add new nodes at any time while the system is running.
- Self-Healing: The system detects errors automatically and restores the desired redundancy without manual intervention.
Best Practices for Maximum Efficiency
Planning a cluster is always a balance between cost (efficiency), security, and performance.
How many nodes are required?
- Replication (3x): You need at least 3 nodes. With only 2 nodes, a single failure would leave you with no copy to “heal” the cluster back to health.
- Erasure Coding: The rule of thumb is $k + m + 1$ nodes. For an EC 4+2 profile (4 data chunks, 2 parity chunks), you should plan for at least 7 nodes. This way, one entire server can fail, and Ceph still has enough “targets” to redistribute the data in a $4+2$ scheme.
Efficiency Comparison
| Strategy | Min. Nodes (Recommended) | Efficiency (Net/Raw) | Security |
| Replica 3x | 3 | 33.3% | High (Fast recovery) |
| EC 2+1 | 4 | 66.7% | Medium (1 failure tolerated) |
| EC 4+2 | 7 | 66.7% | Very High (2 failures tolerated) |
| EC 8+3 | 12 | 72.7% | Extremely High |
Tips for Building Your Cluster
- Uniform OSD Sizes: Whenever possible, use hard drives of the same size. Ceph fills OSDs based on percentage. If you mix sizes (e.g., 4 TB and 12 TB), the small drives will fill up much faster, which can block the entire cluster.
- Network Bandwidth: 10 GbE networking is the absolute minimum for Ceph. In the event of a disk failure, Ceph must copy terabytes of data across the network. A 1 GbE network would become an immediate bottleneck and cripple your application performance.
- OSD Distribution: Aim for about 10 to 15 OSDs (disks) per node. Having too many disks in a single server increases risk: if that one server fails, the network must compensate for a massive amount of data all at once.
