Get started in web development with me — Part 2: Protocols, Servers and HTTP

Gabriel Cruz
5 min readJan 11, 2019

On my last post I discussed the approach I’ll use on this series, talked a bit about my background and started discussing some core concepts of web: HTML and URLs.

Lost in the series? Here are Part 1, Part 2, Part 3, Part 4 and Part 5.

So where were we?

We saw that a URL can basically be divided into three parts: protocol, hostname, and fileinfo. In case you forgot, here’s the anatomy of a common URL:

protocol://hostname/fileinfo

We already know that fileinfo is just the path to a file, ok. But what about the protocol and the hostname?

Protocols

What the heck is a protocol anyways? Let’s see:

In telecommunication, a communication protocol is a system of rules that allow two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchronization of communication and possible error recovery methods.

Okay, so here’s another way of thinking about it: suppose you received a letter but you don’t know either where it came from or the language in which it was written (let’s say you don’t recognize the alphabet it uses):

เรียนท่านผู้อ่าน

ฉันดีใจที่คุณมาถึงที่นี่ แสดงความคิดเห็นคำกล้วยถ้าคุณอ่านนี้

Even if you could use google translator you wouldn’t be able to read that letter unless you tried every language (letting google suggest the language to you doesn’t count, you cheater). Now suppose that the whole world agreed on a new rule: every letter has to have a field “language” in it, in which the sender writes the name of the language he’s using in english:

language: Tamil

ஓ, கடைசி நாளில் எழுதப்பட்ட எந்த மொழியில் நான் உங்களுக்கு சொல்லமாட்டேன். ஆனால் நீங்கள் இங்கு வந்தால் நல்லது.

Wow! Now you can easily look it up on any translator for Tamil. The analogy here is that the language itself is a protocol, a set of rules, syntax and all that crap, and the field language in the letter is the protocol field on a URL.

So when a server sees the URL http://en.wikipedia.org/wiki/Herschel_Evans arriving, it knows that that is an HTTP ‘letter’ and so it can read and understand the message (in this case, “give me the wiki page for Herschel Evans”).

Other examples of protocols are TCP and IP, used for sending and receiving stuff over the internet. They’re very different from HTTP, but HTTP often relies on both of them. This doesn’t really matter right now to us. Take a look at the OSI Model if you’re interested.

“But wait, what’s a server?”

I’m glad you asked. A server is, you guessed it, a machine that serves something (a service) to clients. Clients are the machines a server works for. The clients are responsible for reaching out to the server, requesting and even sending things to it. Clients can be regular PCs, smart coffee machines, and even other servers! The thing to realize here is that clients and servers are roles and any machine can be either of those (maybe even both!) depending on the case.

So a server is basically a machine that works for others. It can serve web pages (such servers are called web servers), domain names (more on that later), etc. Every server works the same way: it receives a request (“could you give me the Wikipedia page for Herschel Evans please?” or “what time is it?”), it does something, and then it sends a response (“yep, here it is Herschel Evans’ page!”, “no! I’ve exploded”, “it’s 9:30am”, “go to hell”,…).

“Sooo you were talking about HTTP…?”

Oh! Right.

HTTP — HyperText Transfer Protocol

Remember Hypertext? The kind of text we use to write content in the web, usually written in HTML, the HyperText Markup Language and all that sh*t.

So now we know what a protocol is and why it is used for, sweet. HTTP transfers our HTML documents back and forth over the internet. But how does it work?

Here’s a quick tutorial on the HTTP protocol (don’t worry about the CGI part). This tutorial is a bit old (HTTP 1.0 and 1.1) but I don’t think too many things have changed between versions 1 and 2.

So if web servers receive HTTP requests and send responses, this means we can build our own HTTP request. Let’s see how we can do it. Here’s an example of an HTTP request for Herschel Evans’ Wikipedia page:

GET /wiki/Herschel_Evans HTTP/1.0
Host: wikipedia.org
<empty line>

This tells us we want the file /wiki/Herschel_Evans from the website wikipedia.org and that we’re using HTTP version 1.0.

Great, let’s use netcat to make this request. With netcat the request ends up being a little bit different from the above because we first have to establish a connection and then make the request:

$ nc wikipedia.org 80 #don't worry about the '80' there for now
...
$

What the f***? It just shut down the connection? I don’t have any clue why this is happening, lol. Let’s try getting the index page from Google:

$ nc google.com 80 # it works
GET index.html HTTP/1.0
<empty line>
HTTP/1.0 404 Not Found
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 1561
Date: Thu, 10 Jan 2019 17:34:42 GMT
<!DOCTYPE html>
...
# Bunch of uninteresting stuff

Yes! We did it!

“But the response says ‘404 not found’ dude”

Doesn’t matter, we just wanted to make a request and receive a response, so it’s all good. My best guess is that these sites require more sophisticated parameters on the request than we handed them, but this is a problem for later. For now, we kind of know how an HTTP request looks like. We know how to send a request and how to read an HTTP response.

Obs: We won’t get into HTTPS or other protocols right now because I think we need to understand a bit more basic stuff before going into that. Deal?

That’s it for today

I don’t want to dive too deep too fast into concepts like HTTP and HTML. We will surely need to know more about them later, but right now we just want to have a grasp of how things look like on the internet and which thing is used to achieve which goal.

Feel free to comment on any mistakes you see, I mean it. If you know a lot about grammar and realized I should’ve used a comma somewhere that I didn’t or if I said something that’s not entirely true about HTTP pleaaaaase let me know.

Go right up to the next part you.

--

--

Gabriel Cruz

Computer Science student at University of São Paulo. OSS/Linux enthusiast, trailing spaces serial killer, casual pentester