Skip to content

ADR-0022: Datacenter fabric scaling

Proposed
Status

proposed

Date

2026-03-11

Group

networking

Depends-on

ADR-0002, ADR-0004

Context

ADR-0004 chose spine-leaf with BGP/EVPN. At 50,000 physical servers (ADR-0002), a single 2-tier leaf-spine CLOS cannot accommodate all servers — switch radix limits how many leaves a spine layer can connect. The question is how the fabric scales beyond a single pod.

Options

Option 1: Multiple independent fabric partitions

  • Pros: each partition is a self-contained 2-tier leaf-spine CLOS and a natural failure domain; partitions can be added incrementally; no super-spine complexity; cross-partition communication via exit/border switches; well-understood operational model

  • Cons: cross-partition traffic goes through exit layer (higher latency than intra-partition); no single flat L3 fabric spanning all servers; partition sizing must be planned

Option 2: Single multi-stage CLOS (5-stage with super-spine)

  • Pros: single flat fabric across all servers; optimal east-west bandwidth; well-understood in hyperscale datacenters (Google, Meta)

  • Cons: enormous blast radius; complex to operate and automate; not incrementally deployable; requires custom tooling for provisioning and lifecycle management

Option 3: Partition groups with inter-partition spine layer

  • Pros: groups of partitions get low-latency interconnect; better than pure exit-layer routing for cross-partition traffic

  • Cons: additional switch layer to operate; hybrid approach with unclear failure domains; custom topology

Decision

Multiple independent fabric partitions. Each partition is a self-contained 2-tier leaf-spine CLOS (a group of racks sharing one spine layer). Partitions are the unit of scaling — adding capacity means adding partitions. Cross-partition traffic routes through exit switches. This model provides clear failure domains and incremental growth. The provisioning tool (separate ADR) must natively support this partition-based model.

Consequences

  • Partition sizing must be defined (how many racks/servers per partition)

  • Cross-partition latency is higher than intra-partition — tenant clusters should be placed within a single partition where possible

  • Exit switch capacity must be planned for cross-partition and external traffic

  • Partition placement across AZs (ADR-0009) must be defined

  • Adding capacity means adding partitions, not expanding existing fabrics

  • The provisioning tool must support partition-based fabric management (separate ADR)