Benjamin Ryzman
on 11 February 2026
Modern data centres are hitting a wall that faster CPUs alone cannot fix. As workloads scale out and latency budgets shrink, the impact of moving data between servers is starting to become the most significant factor in overall performance. Remote Direct Memory Access, or RDMA, is one of the technologies reshaping how that data moves, and it forces a rethink of some long-held assumptions in data centre networking.
This article is the first in a short series. The follow-ups will look specifically at the two primary network interconnects that enable RDMA, InfiniBand and RoCE. Here, the goal is simpler: explain what RDMA is, why it matters now, and why it can be both powerful and uncomfortable for operators.
Remote Direct Memory Access explained
| Remote Direct Memory Access (RDMA) is a data center networking technology that allows servers to exchange data directly between application memory spaces over the network, bypassing the operating system and CPU on the remote side. This enables lower latency, higher throughput, and more predictable performance compared to traditional TCP/IP networking. |
At its core, RDMA lets one machine read from or write directly into the memory of another machine over the network, without involving the remote CPU or operating system in the data path.
In a conventional TCP/IP exchange, even a fast one, data is copied multiple times. It moves from the NIC into kernel buffers, through the network stack, into user space, and back again on the other side. Each step adds latency, burns CPU cycles, and introduces jitter.
RDMA removes most of that path. Once a connection is set up and memory is registered, the NIC performs the transfer directly between application memory regions. The remote CPU is not interrupted. The kernel is not traversed. The result is very low latency, very high throughput, and far more predictable performance.
➤ RDMA implements kernel bypass and zero-copy networking, avoiding the overheads inherent in TCP/IP-based data center networks.
This is not magic, and it is not free. RDMA shifts responsibility away from the operating system and towards the application and the network. That shift is where both the value and the challenge lie.
Why RDMA matters for modern data center networking
RDMA is not new. High-performance computing (HPC) has used it for decades. What has changed is where the bottlenecks now sit for communication service providers (CSPs) and large enterprises.
Telco clouds, core network functions, analytics pipelines, and AI workloads all rely on intensive server-to-server communication. This east-west traffic dominates over slower, less latency-sensitive north-south flows between the data centers and the Internet. When every microsecond matters, shaving tens of microseconds from each exchange compounds quickly.
Second, CPUs are expensive and increasingly scarce. Offloading data movement from general-purpose cores frees capacity for actual packet processing, encoding, inference, or control-plane logic. Operators deploying UPFs, databases, or message buses at scale see this immediately in reduced core counts per workload.
Third, determinism matters more than peak throughput. Many telco and real-time enterprise applications care less about average latency and more about tail latency. RDMA’s bypass of the kernel scheduler and TCP congestion machinery produces tighter latency distributions, which directly translates into more predictable service behaviour.
This is why RDMA is showing up in modern storage backends, distributed databases, AI training clusters, and increasingly in network-intensive telco workloads. In practice, adopting RDMA is less about enabling a feature in a workload and more about designing the data centre fabric, host configuration, and lifecycle operations to support predictable low-latency behaviour.
RDMA vs TCP/IP: how data center networking is changing
The most important difference is philosophical.
Traditional Ethernet networking assumes the network is unreliable and the hosts are responsible for recovery. TCP embodies this model. It is flexible, forgiving, and remarkably robust, but it hides performance costs behind abstraction.
RDMA assumes a far more cooperative environment. Memory must be explicitly registered. Access permissions are managed at connection setup. Packet loss is expected to be rare, not normal. Reliability is often handled below or beside TCP, or avoided entirely.
From an operator perspective, this means that the network stops being “just IP”. Latency, loss, buffering, and congestion behaviour suddenly matter in very concrete ways. A misconfigured switch buffer or an oversubscribed link that TCP would eventually recover from can stall or break an RDMA workload.
It also means that debugging shifts. Problems that used to be visible in system call traces or kernel metrics may now live in NIC counters, fabric telemetry, or application-level error paths.
How CSPs and enterprises use RDMA in real data centers
In large telco clouds, RDMA is increasingly used underneath control-plane databases and state stores. Distributed shared memory databases, and message buses backed by RDMA-enabled transports have the potential for lower commit latencies under load. The practical effect is faster convergence during scaling events and fewer cascading timeouts during failures.
In enterprise data centres, storage is often the first visible win. NVMe over Fabrics using RDMA allows storage traffic to achieve local-disk-like latency over the network. Operators see higher IOPS with fewer CPU cores consumed on both the compute and storage sides.
AI infrastructure provides another clear example. GPU-to-GPU communication during training is extremely sensitive to latency and jitter. RDMA allows collective operations to scale across nodes without saturating host CPUs, which is why it has become foundational in large training clusters.
Explore GPU‑Direct RDMA in Kubernetes.
Across all of these environments, the pattern is the same. RDMA does not merely make things faster. It changes which resources are stressed and where failures surface.
RDMA challenges in data center networks
RDMA is unforgiving of sloppy networks. Packet loss, excessive buffering, and uncontrolled congestion can have disproportionate impact, so fabric design and validation matter more than ever.
Operational tooling is often immature compared to decades of TCP observability. Teams need to be comfortable with NIC-level metrics, switch telemetry, and application-specific diagnostics.
In real operator environments, many RDMA issues surface before the workload is even deployed. Inconsistent NIC firmware versions, mismatched PCIe settings, incorrect NUMA alignment, or missing kernel modules can silently degrade performance. This is why RDMA adoption tends to push operators towards tighter control of bare-metal provisioning and hardware lifecycle management, rather than treating servers as interchangeable units. Platforms that standardise bare-metal commissioning, firmware baselines, and kernel configuration, such as MAAS, become critical enablers for running RDMA reliably at scale.
Read more about RDMA and AI networking challenges and solutions.
Finally, RDMA tightens coupling between applications and infrastructure. Memory registration, queue depths, and transport choices leak into application design. This is acceptable when performance is critical, but it reduces portability and increases the cost of mistakes.
Closing thoughts
RDMA challenges the comfortable abstraction layers that have served data centres well for years. It trades generality for performance, and flexibility for determinism. For CSPs and enterprises building modern, scale-out infrastructure, that trade-off is increasingly justified.
Understanding RDMA is no longer optional for architects working on high-performance telco clouds or data-intensive platforms. The question is not whether it will appear in your environment, but whether you will be ready for the architectural and operational consequences when it does.
Are you interested in Canonical products supporting RDMA? Contact our experts today.


