Why 0.2 Beat 0.8
The change made the model cleaner. It also made the system worse. What followed was a reminder that real networks operate under constraints that are not always visible from first principles.
Tinkering with Time, Tech, and Culture #47
I have been using AI to help clean up some of our legacy C networking code.
One of the areas it immediately pushed on was our RTT estimator. Specifically this constant:
#define ALPHA_RTT 0.2
The model was very confident this was wrong.
It recommended changing it to:
#define ALPHA_RTT 0.8
The reasoning made sense on the surface. A higher alpha produces a smoother RTT estimate. Less noise. More stability. Closer to what you would expect from a well behaved EWMA.
So we tested it.
The Problem
On clean links, both versions looked fine.
On real networks, especially lossy and variable ones, things started to break down.
We ran repeated iperf tests across a range of links. Wired, wireless, mobile. The pattern was consistent.
With the new model, throughput degraded over time. After several runs, the tunnel would collapse into low throughput and never fully recover.
The original implementation did not show this behavior.
What Changed
The RTT estimator in the system looks like this:
session->rtt = ALPHA_RTT * session->rtt + (1 - ALPHA_RTT) * sample;
With ALPHA_RTT set to 0.2, the system weights new samples heavily. About 80 percent of the estimate comes from the latest measurement.
With ALPHA_RTT set to 0.8, the system does the opposite. It mostly trusts its history and only slowly incorporates new information.
On paper, 0.8 looks more stable.
In practice, it was the source of the failure.
The Part That Is Easy To Miss
RTT in this system is not sampled continuously.
It is only measured from packets that were never retransmitted. Once a packet is retransmitted, its timing data is discarded and it no longer contributes to RTT calculation
That means on a lossy link:
- many packets are retransmitted
- those packets do not produce RTT samples
- valid samples become sparse
Now combine that with a high alpha.
The estimator becomes slow to react at exactly the moment when the network conditions have changed.
If RTT increases during loss, the estimate rises. But when conditions improve, there are not enough clean samples to pull it back down.
The result is a kind of estimator poisoning. The system remembers the bad state and cannot recover from it.
What The Original Code Was Actually Doing
With ALPHA_RTT set to 0.2, the estimator behaves very differently.
- it reacts quickly to new conditions
- it tolerates noisy samples
- it converges back down after disruption
It is less smooth. It is also far more resilient.
What looked like a rough or overly aggressive choice was actually tuned for a specific constraint:
RTT sampling is sparse under loss.
Once you see that, the choice makes sense.
What Changed The Outcome
After feeding the test data back into the model, the recommendation changed.
The conclusion shifted from “this should be smoother” to “this needs to recover faster.”
The original implementation was not wrong. It was well tuned for the environment it runs in.
We are still making small adjustments, but the main takeaway is clear.
The Lesson
Code that has lived in production for years often encodes constraints that are not obvious from first principles.
It reflects:
- incomplete information
- imperfect sampling
- real world failure modes
AI is extremely useful for exploring alternatives and challenging assumptions.
But it optimizes for the model it sees.
Real systems operate inside constraints that are not always visible in the abstraction.
Final Thought
There is a difference between what looks optimal and what survives contact with the network.