How the Internet works

Now that we know what the internet is, we can look under the hood to see how it works.

At its most basic, the Internet is an interconnected network of networks. In other words, millions of computing devices speak to each other and share information.

Now, a virtual Linux server running in an Amazon Web Services (AWS) data centre is quite different to the Ring video door camera that you may have installed in your front door, but they still speak to each other. How?

Protocols (TCP/IP, HTTP)

Protocols are standardized sets of rules for data that allow devices to communicate with each other. TCP/IP (Transmission Control Protocol / Internet Protocol) is the protocol suite that outlines how devices should communicate on the Internet. In the case of our example, that would be the server and the door camera.

Another very important protocol is HTTP, or HyperText Transfer Protocol, which outlines the rules for the web, defining how to load web pages containing hypertext links.

IP Address

We have millions of devices connected, forming networks of networks, and they speak to each other using protocols like TCP/IP. But how can we find an individual device within this virtual haystack?

With an Internet Protocol Address, or IP address. Every device on the internet has a unique IP address, that looks something like 192.158.1.38.

Web addresses

When we open our web browser and type in google.com, our computer makes a request for information to a server located somewhere in the world.

Each device on the public-facing internet (including servers) has a unique IP address, but it would be impossible to remember the IP addresses of all of the different servers we want to access as we browse the web.

Domain Name System (DNS)

Thankfully we have the Domain Name System, or DNS, that maps human-friendly names (domain names) to IP addresses. Rather than typing 93.184.216.34 into our browser,, we can type example.com, and the DNS server will automatically direct us to the correct IP address.

URL

This brings us to the Uniform Resource Locator (URL), also known as a web address. A URL is simply the address for a resource on the web, like https://www.nytimes.com/.

Let’s break down the elements of a URL using an example from a fictitious blog.

https://example-company.com/post/my-great-post?utm_source=twitter&utm_medium=tweet

  • _https _– The “Scheme” tells the browser which protocol it should use to request the resource, in this case the secure version of HTTP as we are requesting a web page
  • example-company.com – The “Authority”, which is usually the domain name for the web server but could also be an IP address. It contains a top-level domain (TLD) like .com _or _.org.uk and a second-level domain (SLD), usually the business or brand name (in this case example-company)
  • _/post/my-great-post _– the “Path”, or address for the specific file we are requesting on the web server
  • ?utm_source=twitter&utm_medium=tweet – “Parameters” are optional pieces of additional information we can send to the web server. In this case, we are telling the server where the link originated from (it was a tweet shared on Twitter) to help with marketing attribution (tracking where users are coming from when they visit a site)

If our URL only has a domain name, like google.com, then we are asking to be taken to the “entrance” or homepage of the server that is located at that IP address. If our URL also contains a path, like mywebsite.com/blog, then we are asking the web server for a specific resource, in this case, the web page that lists blog posts.

Many websites will also use subdomains (eg: https://subdomain.example.com) to help separate and organise the site.

Request and response

What else happens when we open our browser, enter google.com, or click a link?

Our browser requests the correct IP address for the web server from the DNS server. It then sends a request to the web server. If we are browsing the web, we will likely use the HTTP protocol to make an HTTP request.

An HTTP request can do a few things, including:

  • Ask the server to send us a webpage called a GET request.
  • Send some data to the server to process, for example, the content of a Tweet we want to post. This is known as a POST request.
  • Update some existing data on the server, known as a PUT request
  • Delete some data from the server, known as a DELETE request.

Back to our example.

After the web server has processed our request, it will respond. The response will include a response code that tells our browser whether the request has been successful – for example, a response of 200 means the request has been successful, while a response of 404 means that the server could not find the requested resource.

A successful server response will also have a message body containing the requested resource. This will usually be an HTML (Hyper Text Markup Language) file. The web browser will read the HTML file and render it a web page for us to view.

The HTML file will define the content and structure of the web page, but it doesn’t generally provide the browser with any information about styling or interactivity so that the resulting page would look very plain and boring – like the websites in the early days of the web.

To prevent this, there will be references to additional resources like CSS files, Javascript, fonts, images and videos within most HTML files. As the browser reads the HTML file and comes across one of these resources, it will request additional resources and then use the contents of the responses to render a complete, styled web page.

Next:
HTML, CSS & JavaScript