The VPN Worked… Until It Didn’t
aka How I Lost a Week Debugging Site-to-Site Networking Between Clouds and My Brain
There’s something humbling about configuring a VPN.
You think you’re setting up a secure tunnel between two clouds.
But what you’re actually doing is… arguing with routing tables while ping fails in 40 different directions.
This is the story of how I set up a site-to-site VPN between Azure and AWS and nearly lost my will to engineer.
The Plan Was Simple (lol)
Azure Network Gateway on one side
AWS Virtual Private Gateway on the other
Shared pre-shared key
Matching CIDRs
Some BGP for “dynamic” flair
What could possibly go wrong?
Answer:
Everything.
And then some.
Step 1: Azure Said “Cool”, But Meant “Figure It Out Yourself”
Azure’s UI gave me false hope.
Filled in the remote IP, selected the local network gateway, configured IKEv2… boom, created.
Except, no data flowed. No logs. No helpful error. Just vibes.
I eventually realized Azure doesn’t show you tunnel logs unless you specifically go look for them like an archaeologist.
Step 2: AWS Was Gaslighting Me With Static Routes
On the AWS side, everything looked fine:
VPN status: UP
Routes: In the table
Tunnel: Established
Yet no traffic.
I stared at that “static route” section and realized I was routing packets to a black hole.
You can technically define a static route in AWS that sends traffic nowhere.
And the platform will thank you and still charge you.
Step 3: NAT, Oh God, NAT
Let me tell you about the NAT incident.
At some point, I enabled NAT translation on Azure’s virtual network gateway because one guide told me to.
Suddenly, the entire connection broke and the tunnel refused to come back up.
Hours of packet capture later, I realized Azure was rewriting source IPs in ways that made AWS ignore them.
No pings. No logs. No joy.
Step 4: DNS Got Involved and I Almost Quit Tech
After the tunnel finally connected, DNS resolution decided to wild out.
AWS internal services wouldn’t resolve private Azure IPs.
Azure VMs couldn’t reach AWS resources by hostname.
Then I remembered: both sides had private DNS zones.
No one tells you that when you do VPN + private DNS, you’re now in multi-cloud DNS hell.
How I Eventually Fixed It
The fix wasn’t glamorous. It was… ritualistic.
Static routes fixed manually
NAT disabled completely
BGP session reconfigured with longer keep-alive timers
DNS forwarding via custom conditional resolvers
IKE settings manually matched (even the lifetimes, bro)
I literally had a ping
tab open for two hours just to make sure the peace held.
Lessons from the Tunnel
BGP is your friend. Until it's not.
Static routes require spiritual discernment.
Matching encryption policies on both sides is not optional.
Private DNS is a dark art. Use with caution.
Multi-cloud anything is hard. VPN just exposes how fragile it all is.
Final Thoughts From the Tunnel
The VPN is now stable.
The tunnel holds.
Traffic flows.
But sometimes, at 3 a.m., I still hear Azure whisper:
“Connection: Disconnected. Reason: Unknown.”