AWS Outage Takes Down Internet Blamed on DNS -- We Are a Fat Finger Away from Apocalypse
AWS DNS Outage Analysis: Single Points of Failure in Cloud Infrastructure
Investigate the root cause of wide-scale cloud outages, focusing on the fragility of Domain Name System (DNS) and the operational risks created by overwhelming workloads.
Short Summary
- The recent AWS outage traced back to a critical DNS failure that immediately crippled downstream services like Dynamo DB.
- Companies often skip costly redundancy measures (like cross-region failover) to save money, accepting higher risk profiles.
- Extreme work hours (100+ weeks) increase the likelihood of catastrophic human error, such as wiping the wrong production server.
- This discussion urges policymakers and IT leaders to establish robust oversight for essential cloud infrastructure, mirroring old utility standards.
This episode analyzes the cascading effects of the recent Amazon Web Services outage, demonstrating how a single configuration mistake in Domain Name System (DNS) routing can halt global commerce. We explore the trade-offs companies make against redundancy and the human factor behind infrastructure mishaps.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
AI Token Maxxing Failed at Uber - OpenAI is Dead
Eli the Computer Guy
22.2k views
New AI Chip from Alibaba -- China Building a Non American Tech Industry
Eli the Computer Guy
28.9k views
Doxing Israel IDF Soldiers in Canada -- FindIDFSoldiers.net and Democratized Intelligence Operations
Eli the Computer Guy
23.3k views
Linux - Introduction
Eli the Computer Guy
46.4k views
What is the OSI Model
Eli the Computer Guy
67.7k views
What is a Computer Network
Eli the Computer Guy
47.7k views
DON'T GET A CYBERSECURITY DEGREE
Eli the Computer Guy
31.9k views
DNS for Cybersecurity
Eli the Computer Guy
130.5k views
Programming Intro - Best Programming Language
Eli the Computer Guy
47.0k views
Cloud Computing - Introduction
Eli the Computer Guy
97.1k views
Top Comments (10)
Isn't it funny how after decades of networking and IT pioneers warned various companies that we need to make sure we have redundancies and not over centralized networks. WHAT'S THE FIRST THING THE CORPORATE LEADERS CHOOSE TO DO!?
I'm just a low-paid sys admin who had servers in AWS that went down. I took the extra time to configure AWS health checks and automatic failover. I have servers in AWS and hosted locally, and when AWS went down, AWS health checks automatically detected the failure and removed the failed IPs from DNS. With a 60-second TTL, our services were only down for a few minutes while the health checks detected the outage and changed DNS. It amazes me how many other admins don't take the time to configure redundancy.
As soon as i heard about the Cloud outage i knew you were gonna give us your take pretty quickly.
Self hosting has its advantages. One day people will learn.
Back in the mid-80s I worked for a small LA startup that made medical billing software for CP/M systems using 8" floppy disks. One client in Honolulu kept saying our software stopped working after every update. They finally flew me out to fix it (not a bad gig!). Within 10 minutes I found the problem: they were pinning their 8" diskettes to a whiteboard—with magnets! Every “update” was getting wiped before it ever ran. I replaced the disks and spent the rest of the day getting a scenic tour of Oahu.
I entered the IT world right as the cloud movement was beginning. Tools like Docker were there for us to create essentially cloud-agnostic architectures. It seemed that as soon as companies realized they could offload their infrastructure to a 3rd party, they gave up on investing in strong systems and infrastructure engineers.
Told my staff there is nothing I can do. AWS caused a lot of egg on my face today.
Senior engineers - the ones bought up on c64s, soldering irons and machine code, are getting towards retirement. The original guys and gals that built the operating systems and compilers and the internet protocols are also retiring. Junior engineers are not getting apprenticeships. The millennial 10x types are leaving corporate to build their own startups. All that’s left to run ops at these big critical parts of the chain are the career types who played politics and won. Give it 5 more years, and “the internet is down again” will be the new normal
Reading the book on Network Programming in Java some 15-20 years ago as a junior SE, one of the recommendations that stuck with me was, write code/design systems with network failures in mind and have disaster recovery in place.
Its one thing if Salesforce goes down, its a whole different thing if life safety systems go down
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
Isn't it funny how after decades of networking and IT pioneers warned various companies that we need to make sure we have redundancies and not over centralized networks. WHAT'S THE FIRST THING THE CORPORATE LEADERS CHOOSE TO DO!?
I'm just a low-paid sys admin who had servers in AWS that went down. I took the extra time to configure AWS health checks and automatic failover. I have servers in AWS and hosted locally, and when AWS went down, AWS health checks automatically detected the failure and removed the failed IPs from DNS. With a 60-second TTL, our services were only down for a few minutes while the health checks detected the outage and changed DNS. It amazes me how many other admins don't take the time to configure redundancy.
As soon as i heard about the Cloud outage i knew you were gonna give us your take pretty quickly.
Self hosting has its advantages. One day people will learn.
Back in the mid-80s I worked for a small LA startup that made medical billing software for CP/M systems using 8" floppy disks. One client in Honolulu kept saying our software stopped working after every update. They finally flew me out to fix it (not a bad gig!). Within 10 minutes I found the problem: they were pinning their 8" diskettes to a whiteboard—with magnets! Every “update” was getting wiped before it ever ran. I replaced the disks and spent the rest of the day getting a scenic tour of Oahu.
I entered the IT world right as the cloud movement was beginning. Tools like Docker were there for us to create essentially cloud-agnostic architectures. It seemed that as soon as companies realized they could offload their infrastructure to a 3rd party, they gave up on investing in strong systems and infrastructure engineers.
Told my staff there is nothing I can do. AWS caused a lot of egg on my face today.
Senior engineers - the ones bought up on c64s, soldering irons and machine code, are getting towards retirement. The original guys and gals that built the operating systems and compilers and the internet protocols are also retiring. Junior engineers are not getting apprenticeships. The millennial 10x types are leaving corporate to build their own startups. All that’s left to run ops at these big critical parts of the chain are the career types who played politics and won. Give it 5 more years, and “the internet is down again” will be the new normal
Reading the book on Network Programming in Java some 15-20 years ago as a junior SE, one of the recommendations that stuck with me was, write code/design systems with network failures in mind and have disaster recovery in place.
Its one thing if Salesforce goes down, its a whole different thing if life safety systems go down