ADR-0022: Datacenter fabric scaling

Proposed

Status: proposed
Date: 2026-03-11
Group: networking
Depends-on: ADR-0002, ADR-0004

Context

ADR-0004 chose spine-leaf with BGP/EVPN. At 50,000 physical servers (ADR-0002), a single 2-tier leaf-spine CLOS cannot accommodate all servers — switch radix limits how many leaves a spine layer can connect. The question is how the fabric scales beyond a single pod.

Options

Option 1: Multiple independent fabric partitions

Pros: each partition is a self-contained 2-tier leaf-spine CLOS and a natural failure domain; partitions can be added incrementally; no super-spine complexity; cross-partition communication via exit/border switches; well-understood operational model
Cons: cross-partition traffic goes through exit layer (higher latency than intra-partition); no single flat L3 fabric spanning all servers; partition sizing must be planned

Option 2: Single multi-stage CLOS (5-stage with super-spine)

Pros: single flat fabric across all servers; optimal east-west bandwidth; well-understood in hyperscale datacenters (Google, Meta)
Cons: enormous blast radius; complex to operate and automate; not incrementally deployable; requires custom tooling for provisioning and lifecycle management

Option 3: Partition groups with inter-partition spine layer

Pros: groups of partitions get low-latency interconnect; better than pure exit-layer routing for cross-partition traffic
Cons: additional switch layer to operate; hybrid approach with unclear failure domains; custom topology

Decision

Multiple independent fabric partitions. Each partition is a self-contained 2-tier leaf-spine CLOS (a group of racks sharing one spine layer). Partitions are the unit of scaling — adding capacity means adding partitions. Cross-partition traffic routes through exit switches. This model provides clear failure domains and incremental growth. The provisioning tool (separate ADR) must natively support this partition-based model.

Consequences

Partition sizing must be defined (how many racks/servers per partition)
Cross-partition latency is higher than intra-partition — tenant clusters should be placed within a single partition where possible
Exit switch capacity must be planned for cross-partition and external traffic
Partition placement across AZs (ADR-0009) must be defined
Adding capacity means adding partitions, not expanding existing fabrics
The provisioning tool must support partition-based fabric management (separate ADR)