The day was Tuesday, August 12th 2014. I arrived home, only to find an almost unusable internet situation in my home. Some sites such as AnandTech and Google worked fine, but large swaths of the internet such as Microsoft, Netflix, and many other sites were unreachable. As I run my own DNS servers, I assumed it was a DNS issue, however a couple of ICMP commands later and it was clear that this was a much larger issue than just something affecting my household.
Two days later, and there is a pretty clear understanding of what happened. Older Cisco core internet routers with a default configuration only allowed for a maximum 512k routes for their Border Gateway Protocol (BGP) tables. With the internet always growing, the number of routes surpassed that number briefly on Tuesday, which caused many core routers to be unable to route traffic.
BGP is not something that is discussed very much, due to the average person never needing to worry about it, but it is one of the most used and most important protocols on the internet. The worst part of the outage was that it was known well in advance that this would be an issue, yet it still happened.
Let us dig into the root cause. Most of us have a home network of some sort, with a router, and maybe a dozen or so devices on the network. We connect to an internet service provider through (generally) a modem. When devices on your local network want to talk to other devices on your network, they do so by sending packets upstream to the switch (which is in most cases part of the router) and then the switch forwards the packet to the correct port where the other device is connected. If the second device is not on the local network, the packets get sent to the default gateway which then forwards them upstream to the ISP. At the ISP level, in simple terms, it works very similarly to your LAN. The packet comes in to the ISP network, and if the IP address is something that is in the ISP’s network, it gets routed there, but if it is something on the internet, the packet is forwarded. The big difference though is that an ISP does not have a single default gateway, but instead connects to several internet backbones. The method in which internet packages are routed is based on the Border Gateway Protocol. The BGP contains a table of IP subnets, and lists which ports to transfer traffic based on rules and paths laid out by the network administrator. For instance, if you want to connect to Google to check your Gmail, your computer will send a TCP connection to 220.127.116.11 (or another address as determined by your DNS settings and location). Your ISP will receive this packet, and send the packet to the correct port to an outbound part of the internet which is closer to the subnet that the address is in. If you then want to connect to a domain name or website hosting like Anandtech.com, the packet will be sent to 18.104.22.168, and the BGP protocol of the ISP router will then send to possibly a different port. This continues upstream from core router to core router until the packet reaches the destination subnet, where it is then sent to the web server.