Oyugo Obonyo
7 min readApr 25, 2021

--

A journey through the internet; what happens when you type https://www.holbertonschool.com on your browser?

A photo by Denys Rodionenko on unsplash

The concept of time and space travel from the many sci-fi movies has always been fascinating to me; the thought of something/someone suddenly appearing from seemingly nowhere is mind-blowing to say the least. The internet almost behaves in a similar way, one moment you are staring at a blank page on your browser and the next, you are smashing the follow button on your favorite blogger’s Twitter page . Typing a website’s address on the browser’s search tab does not initiate a time or space travel pipeline but instead sets in motion a beautiful sequence that finally serves the requested resource.

Do I know you from somewhere, sir?

Imagine a case where your name is not ‘Jane’ but ‘F-345737’ instead. Cool, right? Wrong. Whereas it might be simple to memorize just your name, it might be rather difficult to memorize other people’s names. Every device on the internet has a unique number, called an IP address, that they use to identify and communicate with each other. An example of such a number would be 216.58.223.68 (www.google.com’s IP address). Typing this number on your browser’s search tab would serve you Google search’s homepage. Imagine having to know all the unique numbers of the various websites you know. Terrible, right? Yes, really terrible. Most humans are not really good at cramming a complicated set of numbers for every website known to them and we’d honestly rather access the websites by their names. Enter Domain Name System (DNS).

DNS links the IP Addresses with their domain name and this linkage is the first step towards being served your requested web page. First, your browser checks whether www.holbertonschool.com exists in its cache memory. If it does not , it proceeds to check whether it exists in the operating system’s cache memory. If the address is again not found, the browser requests the resolver server for the page and the resolver server also checks for the requested resource in its cache memory. The resolver server always knows where to find the root server and if it fails to match the domain name to an IP address, it passes on the domain name to the root server which proceeds to the Top Level Domain (TLD) server in case it fails to match the request. The TLD servers pass on the request to the authoritative name servers in case they fail to satisfy the request. The authoritative name server finally serves the IP Address and the browser saves this address to its memory then finally serves this content from this address. If the authoritative server fails to match your domain name to an IP Address, this simply means the resource you are requesting does not exist and you will be served with an error instead.

An illustration of the DNS Resolution process

TCP/IP

Now that you finally have access to the requested resource, what exactly governs and controls your persistent interaction with the site? Transmission Control Protocol/Internet Protocol (TCP/IP) is a set of communication protocols used to interconnect devices on the internet or any private network. TCP/IP determines how the the client(you, in this case) connects and communicates with the server (the server hosting www.holbertonschool.com).

TCP not only determines how applications can create communication channels across a network, but also how the data to be transmitted is broke down to smaller packets then finally reassembled in the right order once it gets to its destination. IP, on the other hand, defines how to address (sort of ‘label’) each individual packet to make sure it reaches its intended destination. Some of the TCP/IP protocols include:

a) Hyper Text Transfer Protocol (HTTP) handles communication between a web server and web browser.

b) HTTP Secure (HTTPS) handles secure communication between a web server and web browser

c) File Transfer Protocol (FTP) handles file transmission between computers

Someone please call security!

The ‘https://’ prefix on our link denotes that our connection ought to be secure. Web browsers often use visual annotations, with the most common being a padlock icon, to indicate that a HTTPS connection has been initiated. HTTPS uses a Secure Socket Layer (SSL) protocol to ensure the information exchanged between the client and server is encrypted thus secure from external access.

When you request a HTTPS connection to a website (like in our case), the site initially sends its SSL certificate — which contains the public key needed to secure the connection — to your browser and based on this, the ‘SSL handshake’ is initiated. Whereas all the SSL-related sites may have their own version of the SSL handshake, the basic SSL structure involves the sharing of unique codes between the server and client before a secure connection is finally established.

HTTPS protocol might not be so relevant in our case but is especially important in situations where sensitive data like bank details and personal information is consistently shared between the client and server. To ensure even better security, organizations can create a firewall to debar disallowed traffic from getting into their network. The firewall works by explicitly defining, based on filters such as IP Addresses and protocols, which devices have access to its network and which ones do not.

Load-balancer

Voila! We are securely in the site we requested. Assuming there are millions of us trying to access the same website at the same time, how exactly is the site staying up and not crashed by the sheer force of our numbers? In cases where the websites might experience a lot of traffic, a load balancer is put in place.

Like the name suggests, a load balancer’s chief role is to balance the network’s load— as in balance incoming traffic to the network. Ideally, organizations expecting lots of traffic on their site do not host their codebase on just a single server. Instead, their content is hosted on several servers and whenever a user makes a request to the site, the user’s request is redirected to any one of those servers. Huh! How?

The load balancer is in charge of orchestrating this redirection process. It not only facilitates efficient traffic distribution across multiple servers but also ensures reliability and high availability is maintained by redirecting client requests to only available servers. The load balancer uses different approaches (technically read as ‘algorithms’) to determine how to redirect a request. Such approaches include:

a) Round-robin: One request is passed to a server and the next request is passed on to the next server in line; a request is sequentially distributed.

b) Least connection: A new connection request is passed on to the server currently handling the least number of requests.

c) IP hash: A new request is forwarded to a server based on the client’s IP address. Each server is tasked with handling requests from a particular IP address range.

d) Random: Client requests are randomly pushed to any server

A simple network infrastructure with a load balancer

Blah, blah, Server, blah…

The word ‘server’ has been continuously mentioned in this piece but what really is it? A server is a computer, either physical or virtual that provides some sort of functionality to other computers (generally referred to as ‘clients’) and are typically stored in data centers. Servers generally host all of a site’s content including the site’s web server, application server and database. Servers within a server? What!

  1. Web servers

Distinguishing between a web server and a server might be confusing but it really doesn’t have to be. Traditionally, it was quite simple. The obvious difference was that servers were actual physical computers while web servers were (and still are) software that deliver web pages. Rise of the cloud era has (containers and virtual machines specifically) drastically changed definition of servers and they no longer have to be just physical computers.

The clear functional difference between the two has not changed though; servers are computers — with computational components such as memory, storage and an operating system — that primarily provide data or services to other computers within a network while web servers are software that particularly serve web pages. Web servers reside within the servers and contain the site’s codebase. Web servers are accessed through the domain names of sites . When we key in a URL on our browser, the browser requests the file by HTTP and upon a request’s receipt by the web server, the HTTP server accepts the request, finds the content and send it back to the browser through HTTP. The content you see as soon as our www.holbertonschool.com page loads is served by a web server. Famous web servers include Nginx and Apache.

2. Application servers

Web servers mostly serve static content. What about the dynamic content in the site like the shiny ‘sign up’ button or the application forms? Application servers serve dynamic content and performs more logical operations such as the site’s reaction and response to user interactions. Also, it communicates with the database and manages the user’s information. A perfect example of our application server at work is when we click on the ‘Apply Now’ button and an application form pops up. What happens to the information we append on the application form though? Where does it all go to? Database!

3. Database

A database is an organized collection of data. Typically, a database contains rows and columns of stored data and enables one to create, update, read or delete this data.

Keying in a URL on your browser’s search tab and pressing the enter button set in motion a chain of events that happens so fast, it’s barely recognizable and utterly amazing. Albeit short, I hope our journey through the internet was fun and conclusive enough.

--

--