Network Throughput Tuning: 9 Proven Strategies for CTOs
IDACORE
IDACORE Team

Table of Contents
Quick Navigation
Your network is lying to you.
Not maliciously β it's just that most infrastructure teams only look at throughput when something breaks. Packet loss spikes, a database replication job falls behind, someone complains the file transfer took 45 minutes instead of 4. Then you dig in, find the obvious bottleneck, fix it, and move on. The subtler performance you left on the table? Nobody notices that.
I've watched teams run 10GbE links at sustained 2-3Gbps and call it good because nothing was on fire. That's not good. That's 70-80% of your capacity sitting unused because of tuning decisions that were never made, or defaults that were set for 2008 hardware. Let's fix that.
This isn't a listicle. It's a working guide for the people who actually have to implement this stuff β and who'll be the ones explaining the results to leadership.
Start with What You Can Actually Measure
Before touching a single sysctl value, get a real baseline. Not "the monitoring dashboard says we're at 60% utilization." Real numbers from real tools.
iperf3 is your starting point. Run it between hosts you actually care about β your application servers and your storage nodes, your colocation gear and your cloud endpoints β not just between two test boxes on the same switch.
# Server side
iperf3 -s
# Client side β 30 second test, 4 parallel streams
iperf3 -c <server_ip> -t 30 -P 4 -i 5
The -P 4 matters. Single-stream TCP throughput tells you about one connection. Parallel streams tell you about your actual application behavior, where dozens of connections are competing simultaneously.
Watch for three things: raw throughput, retransmits, and variance across the test window. Retransmits above 0.1% of segments are a problem. Throughput that swings wildly across 5-second intervals tells you there's congestion somewhere you haven't found yet.
Also run ss -tin on active connections during load. The rtt and cwnd values in that output will tell you more about what TCP thinks is happening than any graph in your monitoring stack.
Kernel TCP Stack Tuning Is Still Worth Your Time
Yes, modern Linux kernels have better defaults than they did a decade ago. No, that doesn't mean the defaults are right for your workload.
The two most impactful parameters for high-throughput environments are socket buffer sizes and TCP congestion control algorithm. Here's where to start:
# /etc/sysctl.conf additions
# Increase TCP buffer sizes β these support up to ~10Gbps on a 100ms RTT path
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# Enable window scaling and timestamps
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
# BBR congestion control β better than CUBIC for most modern workloads
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply with sysctl -p and retest immediately. On a 10GbE link with reasonable RTT, these changes alone can take you from 4Gbps to 8Gbps sustained on bulk transfers. I've seen it happen in under 20 minutes.
BBR deserves a specific callout. Google developed it to handle the mismatch between traditional loss-based congestion control and modern high-bandwidth, low-latency paths. CUBIC backs off aggressively when it sees packet loss. BBR models the network's actual bottleneck bandwidth and RTT, which means it doesn't confuse buffer bloat with congestion. On paths with any meaningful latency β even 5-10ms β BBR typically outperforms CUBIC by 15-40% on bulk transfers.
One caveat: BBR can be more aggressive in shared environments. If you're running colocation and sharing physical infrastructure with other tenants, test carefully before rolling this to production.
NIC and Driver Tuning: The Layer Most Teams Skip
Your kernel TCP stack doesn't operate in isolation. The NIC driver and hardware offload settings sit between your application and the wire, and bad defaults here will cap your throughput regardless of how well you've tuned the kernel.
Check your current ring buffer sizes:
ethtool -g eth0
If RX and TX current values are below 1024 on a 10GbE+ interface, you're dropping frames under load before the kernel even sees them. Set them higher:
ethtool -G eth0 rx 4096 tx 4096
Then look at interrupt coalescing. The default on most NICs fires an interrupt for every received packet β great for latency, terrible for throughput. Adaptive coalescing batches interrupts when the NIC detects high traffic:
ethtool -C eth0 adaptive-rx on adaptive-tx on
For multi-queue NICs (which is everything modern), make sure IRQ affinity is spread across CPU cores. The irqbalance daemon handles this automatically on most distributions, but verify it's running and that you're not pinning all NIC interrupts to CPU 0 for some historical reason.
Hardware offloads β TSO, GSO, GRO β should generally stay enabled unless you're debugging a specific issue. They exist for a reason.
# Verify offloads are on
ethtool -k eth0 | grep -E "tcp-segmentation|generic-segmentation|generic-receive"
Latency Is a Throughput Problem Too
This is where geography matters more than most teams realize.
TCP throughput is bounded by the bandwidth-delay product: BDP = bandwidth Γ RTT. On a 10Gbps link with 50ms RTT, your theoretical maximum per-stream throughput is about 62.5MB. On the same link with 5ms RTT, it's 6.25MB β but you can run far more parallel streams before hitting congestion, and your application response times are dramatically better.
For companies in the Treasure Valley, this is a real and quantifiable difference. Traffic routed to AWS us-west-2 in Oregon adds 20-40ms of round-trip latency. That's not a complaint β it's physics. The data has to travel there and back. Infrastructure hosted locally in Idaho can deliver sub-5ms RTT to Boise metro endpoints, which means your TCP connections are more efficient, your database queries return faster, and your application feels different to users.
We ran a comparison for a Boise-based healthcare SaaS company that was hosting their application stack in Oregon. Their database replication lag was consistently 200-400ms during peak hours. After migrating the primary workload to local Idaho infrastructure, replication lag dropped to under 20ms. Same application code, same database configuration. The difference was 35ms of round-trip latency they'd been paying a throughput tax on for two years.
The math isn't complicated. The latency was.
Application-Layer Patterns That Undo Good Infrastructure Work
You can tune the kernel perfectly, buy the best NICs, and colocate in a well-connected data center β and still have terrible throughput because of how your application uses the network.
Small write patterns. If your application is doing thousands of 1KB writes to a remote service instead of batching them into larger operations, you're burning connection overhead. Nagle's algorithm was supposed to handle this, but it's often disabled (TCP_NODELAY) for latency reasons, which means every small write goes on the wire immediately. Fix this at the application layer with explicit batching, not by re-enabling Nagle.
Connection churn. TIME_WAIT accumulation from short-lived connections can exhaust your local port range and cause connection failures under load. Check with:
ss -s | grep TIME-WAIT
cat /proc/sys/net/ipv4/ip_local_port_range
If you're burning through the default 28,000-port range, either widen it (net.ipv4.ip_local_port_range = 1024 65535) or, better, fix the application to use connection pooling.
TLS handshake overhead. On high-connection-rate services, TLS session resumption can meaningfully reduce overhead. Make sure your TLS termination layer has session caching enabled and that session ticket keys are rotated on a reasonable schedule (24 hours is fine for most workloads, not the 1-hour default some frameworks use).
DNS resolution under load. This one bites teams regularly. If your application is resolving DNS on every connection instead of caching results, and your DNS server is remote, you're adding latency to every new connection. Run a local resolver. unbound or systemd-resolved with aggressive caching is a 30-minute setup that pays off immediately.
Putting It Together: A Tuning Sequence That Actually Works
Don't tune everything at once. You won't know what helped.
- Baseline first. Run
iperf3with parallel streams. Record throughput, retransmits, RTT fromss -tin. - Tune kernel TCP buffers and enable BBR. Retest. If throughput improved significantly, you were buffer-limited.
- Check NIC ring buffers and interrupt coalescing. Retest. If you were dropping frames, this shows up as a sudden jump in throughput.
- Audit application connection patterns. Look for small writes, connection churn, and missing connection pooling.
- Evaluate latency to your critical endpoints. If you're routing traffic to a distant region when local infrastructure is available, calculate the BDP cost you're paying.
Each step takes an hour or less to implement and test. The whole sequence can be done in a day. The throughput improvements are often 2-3x on links that were nominally "fine."
The infrastructure side of this equation β physical proximity, well-connected facilities, hardware that's actually configured correctly β is something we think about constantly at IDACORE. Our data center in Weiser is 85 miles from Boise, and we've built the network to deliver sub-5ms latency to Treasure Valley endpoints because we've been running BGP and managing routing tables since before most cloud providers existed. If you're doing this tuning work and want to talk through what your specific workload would look like on infrastructure that isn't fighting you on latency from the start, tell us what you're working with β we'll give you real numbers, not a sales deck.
Tags
IDACORE
IDACORE Team
Expert insights from the IDACORE team on data center operations and cloud infrastructure.
Related Articles
Strategies for Efficient Cloud Migration to Idaho Data Centers
Discover efficient cloud migration strategies to Idaho data centers: cut costs by 40% with low energy rates, slash latency 30-50%, and boost performance using colocation and DevOps tactics.
Implementing Effective Cloud Monitoring in Idaho Facilities
Discover effective cloud monitoring strategies for Idaho's colocation facilities. Prevent downtime, optimize resources, and leverage low-cost renewable energy with expert step-by-step tips.
Mastering Cloud Monitoring in Idaho Colocation Centers
Master cloud monitoring in Idaho colocation centers: Leverage low costs, renewable energy, and top tools to boost reliability, cut downtime, and optimize performance.
More Network Performance Articles
View all βLow-Latency Networks: Benefits of Idaho Colocation
Unlock lightning-fast networks with Idaho colocation: slash latency to milliseconds, cut costs with cheap hydro power, and enjoy reliable, green energy for peak performance.
Maximizing Bandwidth in Idaho's High-Performance Networks
Unlock peak bandwidth in Idaho's high-performance networks: Discover optimization strategies, low-latency tips, and real-world case studies for cost-effective colocation success.
Maximizing Network Throughput in Idaho Data Centers
Boost network throughput in Idaho data centers with expert strategies. Leverage low power costs, renewable energy, and optimization tips to slash latency, cut costs, and scale efficiently.
Ready to Implement These Strategies?
Our team of experts can help you apply these network performance techniques to your infrastructure. Contact us for personalized guidance and support.
Get Expert Help