How does my Internet work

I was wondering how my Internet actually works so I started this to collect information about it.

OSI Model

When you try to understand network traffic, the OSI Model is always helpful to make clear at which level you look at your network.

NumberLayer nameWhat is the smallest unit transmitted (PDU)FunctionExamples
1PhysicalBit, SymbolTransmission of raw bit streams over a physical mediumCat 6 cable
2Data linkFrameReliable transmission of data frames between two nodes connected by a physical layerEthernet
3NetworkPacketStructuring and managing a multi-node network, including addressing, routing and traffic controlIP, ARP, ICMP, IGMP
4TransportSegment, DatagramReliable transmission of data segments between points on a network, including segmentation, acknowledgement and multiplexingTCP, UDP
5SessionDataManaging communication sessions, i.e., continuous exchange of information in the form of multiple back-and-forth transmissions between two nodesDNS, FTP
6PresentationDataTranslation of data between a networking service and an application; including character encoding, data compression and encryption/decryptionDNS, FTP
7ApplicationDataHigh-level APIs, including resource sharing, remote file accessHTTP

(source OSI Model)

My Home network

My home network (as many others) consists of lots of "computers" (I will use the term computer here but this may be anything from classic computers in big boxes to your smart watch). Those computers are somehow connected to a router provided by your internet company. This provides the connectivity to the Internet.

Ethernet

I still connect some computers at home via a network cable with the router of my Internet Provider. This follows the Ethernet standard. So how does this work?

On the physical layer (OSI 1) it makes sense to look at anything from Fast Ethernet (100MBit) or up. It defines that you have a star topology, where every computer is connected by its own network cables to a central device in your network. Unlike a bus network, where all computers are connected to the same cable.

Cables being max 100 meters longs, consisting of 8 wires, twisted to each each other. See for example category 5 cables for more information.

So how does Ethernet work? Every Ethernet network card has a unique ID, the MAC address assigned by the manufacture of the network card. The data that you send (and later receive) is bundled into data packages, build by zeros and ones. They have a clear format with mostly fixed size fields (see Ethernet frame).

FieldLenghtDescription
Preamble7 octetsPattern of 56 byte with alternating 1 and 0, helps you to understand how long a value will be present on the medium before the next one comes (so this is part of OSI 1)
Start of frame delimiter1 octetsThis will be 10101011 and indicates that the data link (OSI 2) relevant data will start next)
MAC destination6 octetsDestination ID
MAC source6 octetsSource ID
802.1Q tag4 octetsPriority of package
length2 octetsHow long the payload will be
Payload46-1500 octetsThe payload
Checksum4 octetsWay to check for transmission errors
Interpacket gap12 octetsDo not send anything here to give others the chance to also send (this is again OSI 1)

For OSI 1 it is important that both nodes on the Ethernet cable can start to send data whenever they want (at least whenever it seems to them there is currently nobody else sending). The maximum length of cables, the bits per second you are sending and the minimum package size have been carefully designed that both sides can detection collisions while still sending their package. So imagine A starts to send data. B starts to send data as well because A's data is not yet there and B wrongly assumes it would be safe to start to send. Then it is important that B's data reaches A before A is finished sending and (wrongly assumes) its data was send without problems. Today this is less important as you have star topologies with point to point network cables and exclusive wires for sending and receiving.

For OSI 2 important fields in there are the sender MAC address and the receiver MAC address plus they payload. If you see a package on the cable that is not for your MAC address you can safely ignore it. You should however respond the the MAC address that consists only of ones (broadcast).

In the centre of your star topology network is a Switch. It takes the packages of the cable and resends them. This resets the timings for collision detection, buffers (server senders might send at the same time a receiver so the cable to the receiver can easily not be fast enough to deliver it as fast as it comes in). The switch also understands the topology, which MAC addresses can be found via which cable connection. This gets trickier if you connect several switches with each other. Imaging an office building where all computers in one floor are connected to one switch, and all the switches are connected to each other. The router, your Internet provider gave to you, usually is also a switch for a small home network.

IP

The Internet Protocol. Its packages may be the payload of Ethernet packages. The almost deprecated version is IPV4. Its headers look like this

FieldDescription
VersionThe number 4
IHLThe IHL field contains the size of the IPv4 header
DSCPType of service
ECNCongestion Notification
Total LengthThe entire packet size in bytes, including header and data
IdentificationIdentifying the group of fragments of a single IP datagram
FlagsControl or identify fragments
Fragment OffsetSpecifies the offset of a particular fragment relative to the beginning of the original unfragmented IP datagram
Time To LiveHop count, every router reduces the value by one to avoid endless loops
ProtocolIP protocol
Header ChecksumChecksum for the header, the payload needs to have its own checksums
Source IP AddressSender
Destination IP AddressReceiver
OptionsNot often used

The current version is IPv6. Its headers look like this

FieldDescription
VersionThe number 6
Traffic ClassClassify packets and Explicit Congestion Notification
Flow LabelID for a group of package,
Payload LengthSize of the payload in octets
Next HeaderThis field usually specifies the transport layer protocol used by a packet's payload
Hop LimitHop count, every router reduces the value by one to avoid endless loops
Source AddressSender
Destination AddressReceiver

UDP

With the IP protocol we can already send single packages from one node to the other. Often you want that computers offer multiple servers to multiple clients in parallel. This is where ports come into place. A computer may offer a webserver at port 80 and a mail server at port 25. So you send an IP package to the server, in its payload it has an UDP package, that specifies to which port you want to connect to. Also to which port and your won computer the answer should come back.

FieldDescription
Source portThe port to reply to if needed
Destination portThe port to connect to
LengthThis field specifies the length in bytes of the UDP header and UDP data. The minimum length is 8 bytes, max 65507 bytes for IPv4, using IPv6 jumbograms you may have bigger packages
ChecksumThe checksum field may be used for error-checking of the header and data. This field is optional in IPv4, and mandatory in IPv6

When you send or receiver data via the IP protocol you will usually have to split it in lots of packages. The receiver than needs to put those packages back together in the right order. As the IP protocol may send each package via completely different ways to you, the order of the packages might be very different from their original order. Also some packages might easily get lost on their way to you, maybe even duplicated.

The UDP protocol does not help you with this, but with some use cases you do not care to much. Think for example about a multiplayer game where you constantly send the current position of all players to all players. If you miss a package in between just take the next one, getting packages out of order would not be very dramatic and you can send a timestamp or a counter with each package to ignore older packages if you want.

TCP

In many use cases getting all packages and in the right order is however very important. Thank about downloading a huge file. The Transmission Control Protocol (TCP) helps you with this. You establish a connection, send your data and the protocol ensures that you get all the packages in the right order. If packages are missing they are re requested. The protocol is rather complex but used for most things in the Internet.

DNS

The IP protocol use IP addresses for its communication like we use phone numbers to call people. But humans prefer to use names instead of numbers. This is why we have a phone book for calling and DNS for the IP protocol.

So how does it work? When you want to send an IP package this www.tgunkel.de (as you did to read this website) your computer first gets all the root DNS servers. Those are for example

k.root-servers.net.
j.root-servers.net.
c.root-servers.net.
...

You ask one of the root DNS servers who is responsible for all the "de" domains. Those are for example

n.de.net.
f.nic.de.
z.nic.de.
...

You ask one of them who is responsible for "tgunkel.de"

ns2.hans.hosteurope.de.
...

You ask them what the IP address for "www.tgunkel.de" is, only to find out that it is an CNAME (an alias) so you have to do this all again for the alias name. Luckily all the entries have a time to live, so you do not need to do all of this again and rather use the cached information you learned in the earlier steps. Also most people do not talk directly to the root servers but rather to a caching DNS server that is close to them (e.g. operated by your provider or 1.1.1.1 or 8.8.8.8).

Usually the DNS requests are send via UDP unless you would need more than one package, than you use TCP.

You can try it out like this

dig +trace +additional -t A www.tgunkel.de.

HTTP

The Hypertext Transfer Protocol is used to request websites. Its communication looks like this:

Request

GET / HTTP/1.1
Host: www.example.com

Response

HTTP/1.1 200 OK
Age: 497225
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 20 Sep 2020 11:11:20 GMT
Etag: "3147526947+ident"
Expires: Sun, 27 Sep 2020 11:11:20 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (dcb/7EEF)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

<!doctype html>
<html>
<head>
<title>Example Domain</title>

Putting it together

There is a global pool of unique IP_address. When you want to participate in the Internet you need to have at least one of those addresses and the rest of the Internet needs to know how to reach you. Your Internet provider has a lot of those addresses. Your router gets one or more of those addresses assigned.

In your local network every node needs to have also an IP address. It request it from the router via DHCP or SLAAC. If the router only has one IP address, is uses private networks and with NAT.

Every node does not only have have an IP address but also routing tables that explain how to reach other IP addresses.

If the destination is in the local network you use ARP to find its MAC address and send a package. Otherwise you send it to the IP address of the gateway that is configured in the routing table for the network the destination IP address is in. Again using ARP to find the MAC address of the the gateway. The gateway will just do the same.

Example:

# host -t A tgunkel.de

tgunkel.de has address 83.169.2.206

# route -n
DestinationGatewayGenmaskFlagsMetricRefUseIface
0.0.0.0192.168.0.10.0.0.0UG000eth0

So we need to send any package to www.tgunkel.de to 192.168.0.1. This is the provider's router in my apartment. The router will do exactly the same until eventually the package ends up in the network where the destination computer is actually located.

This shows the hosts on the way

# traceroute -n -w 10 tgunkel.de
traceroute to tgunkel.de (83.169.2.206), 30 hops max, 60 byte packets
 1  192.168.0.1  0.747 ms  0.776 ms  0.830 ms
 2  * * *
 3  84.116.190.37  14.655 ms  16.694 ms  21.488 ms
 4  84.116.197.245  48.253 ms  48.250 ms  48.204 ms
 5  84.116.134.217  21.221 ms  21.196 ms  21.132 ms
 6  62.115.42.170  21.219 ms  20.755 ms  20.628 ms
 7  62.115.144.9  22.621 ms  22.065 ms  16.730 ms
 8  87.230.112.3  20.142 ms  20.174 ms  18.458 ms
 9  * * *
10  83.169.2.206  28.251 ms  27.404 ms  23.944 ms

Some gateways on the way are shown as * * * because they do not respond to the ICMP protocol.

We use TCP to establish a connection to the port 80 of the webserver on tgunkel.de. On this connection we request with the HTTP protocol the website and all its extra content like images, fonts, CCS, ...

Internet Providers

Cable Internet

My internet currently is provided by Unitymedia / Vodafone via cable internet. The router is connected to Cable plug that provides radio, TV and internet. All apartments in one house are connected to one box. On the street you would see grey boxes that contain the so called Kabelverzweiger. From this grey box on the street till the box in your apartment they use old Coaxial cable, from the grey box on the street level they use fibre cables. On a city level you have a hub that provides all the grey boxes in the streets with data. On top of that are so called Head Stations. Here the DOCSIS protocol ends and the IP protocol starts.

Communication between providers

If you want to communicate with a server of your internet provider or the computer of another customer of your internet provider this can be routed inside your provider's network.This is called a access network. Usually you would want to access computers outside of that network and this is were it ends up being a more or less local network and you are in the Internet. The network between providers is called https://en.wikipedia.org/wiki/Backbone_network.

There are different types of providers, Tier 3 providers are rather small and only provide end users internet access. They exchange data with other Tier 3 providers for free (https://en.wikipedia.org/wiki/Peering) and sell data exchange to end users (https://en.wikipedia.org/wiki/Internet_transit).

Tier 2 providers are active in multiple regions, selling transit to Tier 3 providers and end users. The do peering with other Tier 2 providers.

Tier 1 are the biggest providers that do not need to buy traffic and rather sell transit to Tier 2 providers and peer with other Tier 1 providers.

Transit and Peering can happen in https://en.wikipedia.org/wiki/Internet_exchange_point

The routing between different providers is steered with the https://de.wikipedia.org/wiki/Border_Gateway_Protocol

Cross Country Internet Cables

There are high capacity fibre optic cable system, both terrestrial and sea cables. Here is a map of sea cables.

Latency

Even with the speed of light you would need around 100ms to travel around the earth, even after a few rounds you are above one second. Therefore long distance internet communication has a very noticeable delay. This is why people try to get their servers close to their end users.