How does my Internet work
I was wondering how my Internet actually works so I started this to collect information about it.
OSI Model
When you try to understand network traffic, the OSI Model is always helpful to make clear at which level you look at your network.
Number | Layer name | What is the smallest unit transmitted (PDU) | Function | Examples |
---|---|---|---|---|
1 | Physical | Bit, Symbol | Transmission of raw bit streams over a physical medium | Cat 6 cable |
2 | Data link | Frame | Reliable transmission of data frames between two nodes connected by a physical layer | Ethernet |
3 | Network | Packet | Structuring and managing a multi-node network, including addressing, routing and traffic control | IP, ARP, ICMP, IGMP |
4 | Transport | Segment, Datagram | Reliable transmission of data segments between points on a network, including segmentation, acknowledgement and multiplexing | TCP, UDP |
5 | Session | Data | Managing communication sessions, i.e., continuous exchange of information in the form of multiple back-and-forth transmissions between two nodes | DNS, FTP |
6 | Presentation | Data | Translation of data between a networking service and an application; including character encoding, data compression and encryption/decryption | DNS, FTP |
7 | Application | Data | High-level APIs, including resource sharing, remote file access | HTTP |
(source OSI Model)
My Home network
My home network (as many others) consists of lots of "computers" (I will use the term computer here but this may be anything from classic computers in big boxes to your smart watch). Those computers are somehow connected to a router provided by your internet company. This provides the connectivity to the Internet.
Ethernet
I still connect some computers at home via a network cable with the router of my Internet Provider. This follows the Ethernet standard. So how does this work?
On the physical layer (OSI 1) it makes sense to look at anything from Fast Ethernet (100MBit) or up. It defines that you have a star topology, where every computer is connected by its own network cables to a central device in your network. Unlike a bus network, where all computers are connected to the same cable.
Cables being max 100 meters longs, consisting of 8 wires, twisted to each each other. See for example category 5 cables for more information.
So how does Ethernet work? Every Ethernet network card has a unique ID, the MAC address assigned by the manufacture of the network card. The data that you send (and later receive) is bundled into data packages, build by zeros and ones. They have a clear format with mostly fixed size fields (see Ethernet frame).
Field | Lenght | Description |
---|---|---|
Preamble | 7 octets | Pattern of 56 byte with alternating 1 and 0, helps you to understand how long a value will be present on the medium before the next one comes (so this is part of OSI 1) |
Start of frame delimiter | 1 octets | This will be 10101011 and indicates that the data link (OSI 2) relevant data will start next) |
MAC destination | 6 octets | Destination ID |
MAC source | 6 octets | Source ID |
802.1Q tag | 4 octets | Priority of package |
length | 2 octets | How long the payload will be |
Payload | 46-1500 octets | The payload |
Checksum | 4 octets | Way to check for transmission errors |
Interpacket gap | 12 octets | Do not send anything here to give others the chance to also send (this is again OSI 1) |
For OSI 1 it is important that both nodes on the Ethernet cable can start to send data whenever they want (at least whenever it seems to them there is currently nobody else sending). The maximum length of cables, the bits per second you are sending and the minimum package size have been carefully designed that both sides can detection collisions while still sending their package. So imagine A starts to send data. B starts to send data as well because A's data is not yet there and B wrongly assumes it would be safe to start to send. Then it is important that B's data reaches A before A is finished sending and (wrongly assumes) its data was send without problems. Today this is less important as you have star topologies with point to point network cables and exclusive wires for sending and receiving.
For OSI 2 important fields in there are the sender MAC address and the receiver MAC address plus they payload. If you see a package on the cable that is not for your MAC address you can safely ignore it. You should however respond the the MAC address that consists only of ones (broadcast).
In the centre of your star topology network is a Switch. It takes the packages of the cable and resends them. This resets the timings for collision detection, buffers (server senders might send at the same time a receiver so the cable to the receiver can easily not be fast enough to deliver it as fast as it comes in). The switch also understands the topology, which MAC addresses can be found via which cable connection. This gets trickier if you connect several switches with each other. Imaging an office building where all computers in one floor are connected to one switch, and all the switches are connected to each other. The router, your Internet provider gave to you, usually is also a switch for a small home network.
IP
The Internet Protocol. Its packages may be the payload of Ethernet packages. The almost deprecated version is IPV4. Its headers look like this
Field | Description |
---|---|
Version | The number 4 |
IHL | The IHL field contains the size of the IPv4 header |
DSCP | Type of service |
ECN | Congestion Notification |
Total Length | The entire packet size in bytes, including header and data |
Identification | Identifying the group of fragments of a single IP datagram |
Flags | Control or identify fragments |
Fragment Offset | Specifies the offset of a particular fragment relative to the beginning of the original unfragmented IP datagram |
Time To Live | Hop count, every router reduces the value by one to avoid endless loops |
Protocol | IP protocol |
Header Checksum | Checksum for the header, the payload needs to have its own checksums |
Source IP Address | Sender |
Destination IP Address | Receiver |
Options | Not often used |
The current version is IPv6. Its headers look like this
Field | Description |
---|---|
Version | The number 6 |
Traffic Class | Classify packets and Explicit Congestion Notification |
Flow Label | ID for a group of package, |
Payload Length | Size of the payload in octets |
Next Header | This field usually specifies the transport layer protocol used by a packet's payload |
Hop Limit | Hop count, every router reduces the value by one to avoid endless loops |
Source Address | Sender |
Destination Address | Receiver |
UDP
With the IP protocol we can already send single packages from one node to the other. Often you want that computers offer multiple servers to multiple clients in parallel. This is where ports come into place. A computer may offer a webserver at port 80 and a mail server at port 25. So you send an IP package to the server, in its payload it has an UDP package, that specifies to which port you want to connect to. Also to which port and your won computer the answer should come back.
Field | Description |
---|---|
Source port | The port to reply to if needed |
Destination port | The port to connect to |
Length | This field specifies the length in bytes of the UDP header and UDP data. The minimum length is 8 bytes, max 65507 bytes for IPv4, using IPv6 jumbograms you may have bigger packages |
Checksum | The checksum field may be used for error-checking of the header and data. This field is optional in IPv4, and mandatory in IPv6 |
When you send or receiver data via the IP protocol you will usually have to split it in lots of packages. The receiver than needs to put those packages back together in the right order. As the IP protocol may send each package via completely different ways to you, the order of the packages might be very different from their original order. Also some packages might easily get lost on their way to you, maybe even duplicated.
The UDP protocol does not help you with this, but with some use cases you do not care to much. Think for example about a multiplayer game where you constantly send the current position of all players to all players. If you miss a package in between just take the next one, getting packages out of order would not be very dramatic and you can send a timestamp or a counter with each package to ignore older packages if you want.
TCP
In many use cases getting all packages and in the right order is however very important. Thank about downloading a huge file. The Transmission Control Protocol (TCP) helps you with this. You establish a connection, send your data and the protocol ensures that you get all the packages in the right order. If packages are missing they are re requested. The protocol is rather complex but used for most things in the Internet.
DNS
The IP protocol use IP addresses for its communication like we use phone numbers to call people. But humans prefer to use names instead of numbers. This is why we have a phone book for calling and DNS for the IP protocol.
So how does it work? When you want to send an IP package this www.tgunkel.de (as you did to read this website) your computer first gets all the root DNS servers. Those are for example
j.root-servers.net.
c.root-servers.net.
...
You ask one of the root DNS servers who is responsible for all the "de" domains. Those are for example
f.nic.de.
z.nic.de.
...
You ask one of them who is responsible for "tgunkel.de"
...
You ask them what the IP address for "www.tgunkel.de" is, only to find out that it is an CNAME (an alias) so you have to do this all again for the alias name. Luckily all the entries have a time to live, so you do not need to do all of this again and rather use the cached information you learned in the earlier steps. Also most people do not talk directly to the root servers but rather to a caching DNS server that is close to them (e.g. operated by your provider or 1.1.1.1 or 8.8.8.8).
Usually the DNS requests are send via UDP unless you would need more than one package, than you use TCP.
You can try it out like this
HTTP
The Hypertext Transfer Protocol is used to request websites. Its communication looks like this:
Request
Host: www.example.com
Response
Age: 497225
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 20 Sep 2020 11:11:20 GMT
Etag: "3147526947+ident"
Expires: Sun, 27 Sep 2020 11:11:20 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (dcb/7EEF)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
<!doctype html>
<html>
<head>
<title>Example Domain</title>
Putting it together
There is a global pool of unique IP_address. When you want to participate in the Internet you need to have at least one of those addresses and the rest of the Internet needs to know how to reach you. Your Internet provider has a lot of those addresses. Your router gets one or more of those addresses assigned.
In your local network every node needs to have also an IP address. It request it from the router via DHCP or SLAAC. If the router only has one IP address, is uses private networks and with NAT.
Every node does not only have have an IP address but also routing tables that explain how to reach other IP addresses.
If the destination is in the local network you use ARP to find its MAC address and send a package. Otherwise you send it to the IP address of the gateway that is configured in the routing table for the network the destination IP address is in. Again using ARP to find the MAC address of the the gateway. The gateway will just do the same.
Example:
tgunkel.de has address 83.169.2.206
Destination | Gateway | Genmask | Flags | Metric | Ref | Use | Iface |
0.0.0.0 | 192.168.0.1 | 0.0.0.0 | UG | 0 | 0 | 0 | eth0 |
So we need to send any package to www.tgunkel.de to 192.168.0.1. This is the provider's router in my apartment. The router will do exactly the same until eventually the package ends up in the network where the destination computer is actually located.
This shows the hosts on the way
traceroute to tgunkel.de (83.169.2.206), 30 hops max, 60 byte packets
1 192.168.0.1 0.747 ms 0.776 ms 0.830 ms
2 * * *
3 84.116.190.37 14.655 ms 16.694 ms 21.488 ms
4 84.116.197.245 48.253 ms 48.250 ms 48.204 ms
5 84.116.134.217 21.221 ms 21.196 ms 21.132 ms
6 62.115.42.170 21.219 ms 20.755 ms 20.628 ms
7 62.115.144.9 22.621 ms 22.065 ms 16.730 ms
8 87.230.112.3 20.142 ms 20.174 ms 18.458 ms
9 * * *
10 83.169.2.206 28.251 ms 27.404 ms 23.944 ms
Some gateways on the way are shown as * * * because they do not respond to the ICMP protocol.
We use TCP to establish a connection to the port 80 of the webserver on tgunkel.de. On this connection we request with the HTTP protocol the website and all its extra content like images, fonts, CCS, ...
Internet Providers
Cable Internet
My internet currently is provided by Unitymedia / Vodafone via cable internet. The router is connected to Cable plug that provides radio, TV and internet. All apartments in one house are connected to one box. On the street you would see grey boxes that contain the so called Kabelverzweiger. From this grey box on the street till the box in your apartment they use old Coaxial cable, from the grey box on the street level they use fibre cables. On a city level you have a hub that provides all the grey boxes in the streets with data. On top of that are so called Head Stations. Here the DOCSIS protocol ends and the IP protocol starts.
Communication between providers
If you want to communicate with a server of your internet provider or the computer of another customer of your internet provider this can be routed inside your provider's network.This is called a access network. Usually you would want to access computers outside of that network and this is were it ends up being a more or less local network and you are in the Internet. The network between providers is called https://en.wikipedia.org/wiki/Backbone_network.
There are different types of providers, Tier 3 providers are rather small and only provide end users internet access. They exchange data with other Tier 3 providers for free (https://en.wikipedia.org/wiki/Peering) and sell data exchange to end users (https://en.wikipedia.org/wiki/Internet_transit).
Tier 2 providers are active in multiple regions, selling transit to Tier 3 providers and end users. The do peering with other Tier 2 providers.
Tier 1 are the biggest providers that do not need to buy traffic and rather sell transit to Tier 2 providers and peer with other Tier 1 providers.
Transit and Peering can happen in https://en.wikipedia.org/wiki/Internet_exchange_point
The routing between different providers is steered with the https://de.wikipedia.org/wiki/Border_Gateway_Protocol
Cross Country Internet Cables
There are high capacity fibre optic cable system, both terrestrial and sea cables. Here is a map of sea cables.
Latency
Even with the speed of light you would need around 100ms to travel around the earth, even after a few rounds you are above one second. Therefore long distance internet communication has a very noticeable delay. This is why people try to get their servers close to their end users.