Photo by Bartek Wojtas from Pexels

Get started in web development with me — Part 3: building our first web site

Gabriel Cruz
10 min readJan 15, 2019

Hey you, how are you doing?

I’ll be honest with you, I’m in a very worky mood today, so let’s cut the crap and get right on the web stuff.

Lost in the series? Here are Part 1, Part 2, Part 3, Part 4 and Part 5.

Building

We’ve learned a lot about web so far, but it’s about time we build some sh*t.

“So what are we going to build?”

Well, as we’ve been talking about servers and protocols, how about we build our very first web server?

First of all, relax

Don’t be afraid, we won’t mess around with magic stuff just yet. Let’s take the time to build our knowledge and to get comfortable with the web environment. Then we can gradually increase complexity of what we’re doing.

We shouldn’t rush the learning process (at least that’s the way I see it). I hate the anxiety of trying to learn everything at once. Forget about it, it’s not going to work.

Herschel Evans

You must remember our beloved Herschel Evans from the last post. I’m going to create a web page about him. Basic stuff: raw HTML and a couple of links between pages. That’s it.

You can put whatever content you’d like on your page. I chose Herschel Evans because I really know nothing about him and, as I’ve used him as an example earlier, I believe it’s only fair I pay him a little tribute.

H(TML)ello World

Photo by Tyler Lastovich from Pexels

We’ve already looked at a bit of HTML earlier, so I’m guessing you remember absolutely nothing about it. Great, me neither :)

First page

What I want to do is a landing page with a quick introduction about the purpose of the page as well as Herschel’s life, here’s the text I’m using:

Who’s Herschel Evans?

“Herschel Evans was an American tenor saxophonist who worked in the Count Basie Orchestra. He also worked with Lionel Hampton and Buck Clayton. He is also known for starting his cousin Joe McQueen’s interest in the saxophone.” — Wikipedia

Why does this page exist?

This is an example page for my web development series. I started writing the series on Medium, and now I’m creating my first web page example. The reason why I chose Herschel Evans is because I needed something to talk about, so I went to Wikipedia and hit the ‘Random page’ button. I landed on the Herschel Evans’ page, and here we are!

And here’s the HTML:

If you’re wondering what’s the weird, self-closing <!-- index.html --> tag, this is just a comment tag.

Me: That page is not good enough.

You: Why?

Me: Because it has no links to other pages!

Let’s put some links (using anchor tags) to some remote web pages.

Okay, now what if we wanted to pay a tribute to Herschel’s cousin Joe McQueen as well? Let’s make an HTML page for Joe as well.

Yeah I wasn’t very inspired to write that last one.

Okay great, we now have two HTML pages. Since they will be in the same website it’d be nice if they were linked. Since until now we used URLs to reference pages, let’s try the same thing in our local pages. Do you remember how URLs to local files look like?

file://path/to/file

Let’s change our files a little bit to include these paths:

<a href="file://localhost/home/gabriel/html/website_joe_v1.html">
Herschel's cousin!
</a>

And:

<a href="file://localhost/(...)/website_herschel_v2.html">
Joe's cousin!
</a>

Let’s check out all the code one more time, I’ve just appended the paths to the end of the files:

It works!

One more thing: I think we don’t need to use absolute path names to files. That’d be great because if we use relative paths we can actually move our files around new directories and, as long as the relative paths are the same, the links would still work. Check out the “What the hell are relative and absolute paths?” section on the Appendix.

So let’s try using relative paths:

<a href="file://website_herschel_v2.html"> Herschel's page! </a>

Crap, that doesn’t work. Let’s see…

An absolute url includes the parts before the “path” part — in other words, it includes the scheme (the http in http://foo/bar/baz) and the hostname (the foo in http://foo/bar/baz) (and optionally port, userinfo and port).

Relative urls start with a path.

Absolute urls are, well, absolute: the location of the resource can be resolved looking only at the url itself. A relative url is in a sense incomplete: to resolve it, you need the scheme and hostname, and these are typically taken from the current context. For example, in a web page at

http://myhost/mypath/myresource1.html

you could put a link like so

<a href="pages/page1">click me</a>

I did some deeper digging, looks like an URL is a type of URI. I strongly suggest you read this wikipedia page if you’re interested on URLs and this stuff.

Let’s try <a href = "website_herschel_v2.html> Joe's cousin! </a>

It works! And the best part is we can move our htmldirectory wherever we’d like and it still works!

Sweet! Now we have our HTML pages linked to each other and we can access them from our computer. But we don’t build web pages for ourselves, we build them to others.

HTTP Servers: letting others access our pages

Photo by Helena Lopes from Pexels

I’m sure you remember everything about HTTP and servers, so let’s… — wait what!? You don’t remember? Ok I’ll be honest with you, neither did I, lol. Let’s do some quick review:

A server is a machine that provides any kind of service to other machines (the clients). HTTP is the protocol we use to transfer content through the web.

When we’re creating a web site we need a web server in order to serve our HTML content (and maybe a bunch of other sh*t, but for now we just want to serve raw HTML). The server we need right now is one that receives an HTTP request for a specific HTML page, website_herschel_v2.html for instance, and sends this page through HTTP (in other words, it sends an HTTP response with the content).

“So let’s start building our HTTP server!”

Me: Whoa whoa. Hold your horses, we’re not building one of these right now.

You: Why not?

Me: Because we would need to know how to code in some language (C or Python for example) as well as network sockets and other network stuff. Also, as far as I know almost nobody codes their own web servers.

You: Wait, so how are we going to do this?

Me: A-ha! Great question grasshopper. We’re going to use existent popular web servers such as Apache or Nginx.

Basically what we’re going to do with Nginx is start it up and tell it where our HTML files are. I’ve chosen Nginx because it seems to be very popular and it’s simpler than Apache, but go ahead and use whichever you like.

Starting up our web server

Let’s download and install Nginx, check this link if you’re at Linux, usually this is as easy as typing something like:

$ sudo apt-get install nginx

Great, we have our web server installed. Now before we continue we need to understand a bit about network ports, check out the Appendix for that.

Let’s try to run the placeholder that Nginx has (it has some presentation pages):

Go run$ sudo nginx on your terminal. Nginx will try to open the server on port 80, but if it’s already running it means that port 80 is busy and the startup will fail:

ginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
...nginx: [emerg] still could not bind()

If this happens you probably already have Nginx (or other web server) running, so let’s try to access it:

Open your browser and type the address to your machine on port 80 (localhost:80). A nice welcome page for Nginx should appear, this is pretty standard so Apache and other web servers should also have similar placeholder pages.

Serving our pages

Okay, we now have HTML pages and a web server, we just need to tell the server to serve the pages we’ve built. To do this we need to change the configuration files to point to our pages. I strongly suggest you read the first three sections of the Nginx’s Beginner’s Guide so that you don’t get lost on the next part.

Nginx’s configuration files are on /etc/nginx/. The main configuration file is /etc/nginx/nginx.conf, other files are on /etc/nginx/conf.d/. Let’s first make a copy of the config file so we have a backup in case something explodes:

$ sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.orig

Sweet, now we can mess up nginx.conf. Thankfully Nginx has a great tutorial for exactly what we want to do. Here’s what I have after following the tutorial:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 768;
# multi_accept on;
}
http {
server {
location / {
root /home/gabriel/html;
}
}
}

Okay, let’s reload the configuration file:

$ sudo nginx -s reload

And now let’s access http://localhost/ using the browser…

I get a “403 Forbidden” error message, crap. What did we mess up? Let’s go back to the tutorial:

Add the following location block to the server block:

location / {root /data/www;}

This location block specifies the “/” prefix compared with the URI from the request. For matching requests, the URI will be added to the path specified in the root directive, that is, to /data/www, to form the path to the requested file on the local file system.

This means that if we try to access http://localhost/ it will try to go to /home/gabriel/html/ (because our root is /home/gabriel/html). Now what does 403 mean? According to Wikipedia:

HTTP 403 is a standard HTTP status code communicated to clients by an HTTP server to indicate that the server understood the request, but will not fulfill it.

So why did we get this? Well, we didn’t actually request any file right? I mean, we don’t have any default HTML file (index.html) and we tried to access a directory! Let’s specify a file:

http://localhost/website_herschel_v2.html

It works! But how do we make it access a default page? My guess is we rename some file to be index.html, let’s see:

$ mv /home/gabriel/html/website_herschel_v2.html /home/gabriel/html/index.html

Now access again the web site:

http://localhost/

YES! It serves us Herschel’s page!

Obs: I had some trouble with the links on this one, so be sure to check your links when you change file names.

Accessing the website from other machines

We can access our website from other devices in our local network. For that we need to first of all figure out our private IP address. Then we open a browser in the device and type the IP we found instead of localhost:

http://192.168.123.321/

Boom! Our website can be seen by other people!

Obs: For now only the people on our local network can access our website, we’ll take a look at external accesses later.

Later

That’s it for today. I’m very tired. Hope you enjoyed it.

Thanks for reading :D

Appendix

“What the hell are relative and absolute paths?”

Let’s check Wikipedia:

An absolute or full path points to the same location in a file system, regardless of the current working directory. To do that, it must include the root directory.

So an absolute path looks like this:

/home/gabriel/html/

By contrast, a relative path starts from some given working directory, avoiding the need to provide the full absolute path. A filename can be considered as a relative path based at the current working directory.

So a relative path to /home/gabriel/html/website_herschel_v2.html from /home/gabriel/html/website_joe_v1.html would be just website_herschel_v2.html.

Suppose we had another file /home/gabriel/html/stupid/stupid_file.txt. Then the relative path to it from /home/gabriel/html/website_joe_v1.html would be stupid_dir/stupid_file.txt.

“Why the hell are we talking about this again?”

Let’s say we want to move our html directory out of gabriel . The new path for our HTML files would be /home/html/website_joe_v1.html and /home/html/website_herschel_v2.html .

You see the problem with this?

We used absolute paths in the links we created to our HTML files, and now these absolute paths have changed! In order to fix this we need to manually change the URLs in each of the files. This is awful! Every time we want to change directories the links would stop working.

If we use relative paths this wouldn’t happen because relative paths don’t necessarily change when absolute paths do.

Network Ports

As always, let’s take a look at the Wikipedia page for network ports

In computer networking, a port is an endpoint of communication. Physical as well as wireless connections are terminated at ports of hardware devices. At the software level, within an operating system, a port is a logical construct that identifies a specific process or a type of network service.

The software port is always associated with an IP address of a host and the protocol type of the communication. It completes the destination or origination network address of a message. Ports are identified for each protocol and address combination by 16-bit unsigned numbers, commonly known as the port number.

So what is a network port anyways? Well, the way I like to thing about it is this: imagine you live in a house with a bunch of other people and someone wants to mail you a letter. They would need to write the address for your house and your name on it.

An IP address is something that identifies you computer on a network (your house’s address), the port identifies a program to which the message is passed (your name).

Me: Okay, so when you access a web server you have to specify a port as well. The port is, like everything else, specified in the URL.

You: But wait, I’ve never written the port in any URL.

Me: That’s because there’s a default port for web servers so that if none is specified, that’s what will be used, which is port 80, just like the index.htmlfile. In URLs, the port is passed after the host field, separated by a :. Go ahead an access Google, www.google.com:80.

--

--

Gabriel Cruz

Computer Science student at University of São Paulo. OSS/Linux enthusiast, trailing spaces serial killer, casual pentester