Building a Web server in Bash, part III - Login

2022-08-01

In the previous posts we've seen how to manipulate sockets using netcat. Moreover, we learned how to start reading and parsing an HTTP message using ShellScript.

Time to definitely improve our app.

App requirements

Since our application is a Web application, we should respond in a more appropriate way. Let's define some requirements for our application:

GET /login should render an HTML form containing just a text field called "Name" to perform the login
GET / should render a home page containing the text Hello, {{name}}. Where does the {{name}} come from? We'll see soon, but we should read the HTTP header Cookie.
POST /login should request with a body containing the name coming from the HTML form. Afterwards, the server responds an HTTP 301 with a header Set-Cookie, so the client could use this cookie to perform future requests such as GET /
POST /logout should request with a header Cookie containing the name. Afterwards, the server responds an HTTP 301 with a header Set-Cookie using an expiration date in the past. This way, a client HTTP will permanently remove the Cookie.

Web fundamentals matter

If you need, read the requirements again.

Make no mistake, this is exactly how a simple web app used to work prior to the blast of frontend frameworks circa 2015.

Software abstractions are good and save us a lot of time, but unfortunately they also keep developers far from the basics. It's very important for us as developers, to understand how servers keep sessions across HTTP requests.

If you acknowledge that HTTP requests are stateless (the server usually closes the TCP connection at every request), a wide range of capabilities will open for debugging and troubleshooting.

Let's build a login system, and as soon as we understand that HTTP requests are stateless, we realize the logins do not exist.

After all, originally, the web mostly uses HTTP header Cookies to store user data in the HTTP client-side, sending those cookies across future requests to the same domain.

Separate HTTP responses in files

A good practice is to keep our HTTP responses in separate files, thus leading to a more readable and extensible code:

cat login.html

HTTP/1.1 200 OK
Content-Type: text/html

<form method="POST" action="/login">
  <input type="text" name="name" />
  <input type="submit" value="Login" />
</form>

cat 404.html

HTTP/1.1 404 NotFound
Content-Type: text/html

<h1>Sorry, not found</h1>

Okay, with HTML files in place, let's see how our function handleRequest looks like so far:

function handleRequest() {
  ## Read request, parse each line and breaks until empty line
  while read line; do
    echo $line
    trline=`echo $line | tr -d '[\r\n]'`

    [ -z "$trline" ] && break

    HEADLINE_REGEX='(.*?)\s(.*?)\sHTTP.*?'
    [[ "$trline" =~ $HEADLINE_REGEX ]] &&
      REQUEST=$(echo $trline | sed -E "s/$HEADLINE_REGEX/\1 \2/")
  done
   
  ## Route to the response handler based on the REQUEST match
  case "$REQUEST" in
    "GET /login") RESPONSE=$(cat login.html) ;;
               *) RESPONSE=$(cat 404.html) ;;
  esac

  echo -e "$RESPONSE" > response
}

Yes, much more shorter and readable.

Keep the server running forever

Not sure if you noticed, but the server is being closed at every client connection, which is making us to run the server over and over again, at every single time we want to perform a new HTTP request.

We can keep the server in an infinite loop. By doing this, we no longer need to run the server for every request:

while true; do
  cat response | nc -lN 3000 | processRequest
done

How cool is that?

Okay. Open localhost:3000/login in your web browser, type a name in the form and submit it.

I'm sure you got the response "Sorry, not found", right?

No problem, now go to the server STDOUT and check the message coming from the HTTP client:

POST /login HTTP/1.1
Host: localhost:3000
Content-Length: 12
....
....

What does it mean?

Following the HTTP standard, web browsers perform a POST on a form submit, like we have in form action=POST. Upon submission, every field is then added to the HTTP request body.

Please also note that the HTTP message contains a lot of HTTP headers (I'm using Google Chrome), but the HTTP body is nowhere to be found. Why's that?

We should keep reading the request body after the empty line, remember? However, we should first acknowledge the "length" of the body, otherwise, when should we stop reading the remaining bytes?

Thankfully, the HTTP standard covers that. The web browser has already sent an important header called Content-Length.

Our work is then basically parsing this header and use it to read the remaining HTTP request message bytes, the body.

Reading the HTTP body message

Similar to the headline parsing, inside the reading loop, let's place:

CONTENT_LENGTH_REGEX='Content-Length:\s(.*?)'

[[ "$trline" =~ $CONTENT_LENGTH_REGEX ]] &&
      CONTENT_LENGTH=`echo $trline | sed -E "s/$CONTENT_LENGTH_REGEX/\1/"`

Now, after the reading loop, we should read the remaining bytes in case there's a Content-Length header (not always clients send the Content-Length header).

## Check if Content-Length is not empty
if [ ! -z "$CONTENT_LENGTH" ]; then
  BODY_REGEX='(.*?)=(.*?)'

  ## Read the remaining request body
  while read -n$CONTENT_LENGTH -t1 body; do
    echo $body

    INPUT_NAME=$(echo $body | sed -E "s/$BODY_REGEX/\1/")
    INPUT_VALUE=$(echo $body | sed -E "s/$BODY_REGEX/\2/")
  done
fi

Nice. Go and perform a POST /login, you should see the body being displayed in the server STDOUT!

Explaining the read:

-n {N}: reads an arbitrary number of bytes from a file or STDIN
-t {T}: timeout to read the entire message from the STDIN
inside the body reading loop, we parse each line trying to find a match to field=value coming from the POST HTTP request

Fantastic!

Handle the message

Reading the body message is no way close to the end. We have to handle it in the switch-case, remember?

How about refactoring to something like this?

## Route to the response handlers (functions)
case "$REQUEST" in
  "GET /login")   handle_GET_login ;;
  "GET /")        handle_GET_home ;;
  "POST /login")  handle_POST_login ;;
  *)              handle_not_found ;;
esac

All we have to do is implementing the above functions. Let's start the handle_POST_login:

function handle_POST_login() {
  RESPONSE=$(cat post-login.http | \
    sed "s/{{cookie_name}}/$INPUT_NAME/" | \
    sed "s/{{cookie_value}}/$INPUT_VALUE/")
}

reads the post-login.http file
search and replace for a pattern using sed Now, let's check the post-login.http file:

HTTP/1.1 301
Location: http://localhost:3000/
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly

Time to perform the test. Check yourself a dead-simple login written in ShellScript!

Finally, the complete implementation

finally

Handle request

First we should:

setup the FIFO
put netcat in a loop
handle requests

#!/bin/bash

## Create the response FIFO
rm -f response
mkfifo response

function handleRequest() {
  ## Read request
  while read line; do
    echo $line
    trline=$(echo $line | tr -d '[\r\n]')

    [ -z "$trline" ] && break

    HEADLINE_REGEX='(.*?)\s(.*?)\sHTTP.*?'
    [[ "$trline" =~ $HEADLINE_REGEX ]] &&
      REQUEST=$(echo $trline | sed -E "s/$HEADLINE_REGEX/\1 \2/")

    CONTENT_LENGTH_REGEX='Content-Length:\s(.*?)'
    [[ "$trline" =~ $CONTENT_LENGTH_REGEX ]] &&
      CONTENT_LENGTH=$(echo $trline | sed -E "s/$CONTENT_LENGTH_REGEX/\1/")

    COOKIE_REGEX='Cookie:\s(.*?)\=(.*?).*?'
    [[ "$trline" =~ $COOKIE_REGEX ]] &&
      read COOKIE_NAME COOKIE_VALUE <<< $(echo $trline | sed -E "s/$COOKIE_REGEX/\1 \2/")
  done

  ## Read body
  if [ ! -z "$CONTENT_LENGTH" ]; then
    BODY_REGEX='(.*?)=(.*?)'

    while read -n$CONTENT_LENGTH -t1 line; do
      echo $line
      trline=`echo $line | tr -d '[\r\n]'`

      [ -z "$trline" ] && break

      read INPUT_NAME INPUT_VALUE <<< $(echo $trline | sed -E "s/$BODY_REGEX/\1 \2/")
    done
  fi

  ## Route to the response handlers
  case "$REQUEST" in
    "GET /login")   handle_GET_login ;;
    "GET /")        handle_GET_home ;;
    "POST /login")  handle_POST_login ;;
    "POST /logout") handle_POST_logout ;;
    *)              handle_not_found ;;
  esac

  echo -e "$RESPONSE" > response
}

echo 'Listening on 3000...'

## Keep server running forever
while true; do
  ## 1. wait for FIFO
  ## 2. creates a socket and listens to the port 3000
  ## 3. as soon as a request message arrives to the socket, pipes it to the handleRequest function
  ## 4. the handleRequest function processes the request message and routes it to the response handler, which writes to the FIFO
  ## 5. as soon as the FIFO receives a message, it's sent to the socket
  ## 6. closes the connection (`-N`), closes the socket and repeat the loop
  cat response | nc -lN 3000 | handleRequest
done

In short, the server:

waits for FIFO
creates a socket and listens to the port 3000
as soon as a request message arrives to the socket, pipes it to the handleRequest function
the handleRequest function processes the request message and routes it to the response handler, which writes to the FIFO
as soon as the FIFO receives a message, it's sent to the socket
closes the connection (-N), closes the socket and repeat the loop

Implement the handle response functions

Look at the switch-case. It's pretty simple but the functions are yet to be implemented.

Place them before the handleRequest function, because ShellScript expects functions to be declared prior to calling.

function handle_GET_home() {
  RESPONSE=$(cat home.html | \
    sed "s/{{$COOKIE_NAME}}/$COOKIE_VALUE/")
}

function handle_GET_login() {
  RESPONSE=$(cat login.html)
}

function handle_POST_login() {
  RESPONSE=$(cat post-login.http | \
    sed "s/{{cookie_name}}/$INPUT_NAME/" | \
    sed "s/{{cookie_value}}/$INPUT_VALUE/")
}

function handle_POST_logout() {
  RESPONSE=$(cat post-logout.http | \
    sed "s/{{cookie_name}}/$COOKIE_NAME/" | \
    sed "s/{{cookie_value}}/$COOKIE_VALUE/")
}

function handle_not_found() {
  RESPONSE=$(cat 404.html)
}

Place the files accordingly

Now that we have the handle response functions, note that all of them expect HTTP/HTML files to exist. Let's create them. login.html

HTTP/1.1 200 OK
Content-Type: text/html

<form method="POST" action="/login">
  <input type="text" name="name" />
  <input type="submit" value="Login" />
</form>

home.html

HTTP/1.1 200 OK
Content-Type: text/html

<p>Hello, {{name}}</p>

<form method="POST" action="/logout">
  <input type="submit" value="Logout" />
</form>

404.html

HTTP/1.1 404 NotFound
Content-Type: text/html

<h1>Sorry, not found</h1>

post-login.http

HTTP/1.1 301
Location: http://localhost:3000/
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly

post-logout.http

HTTP/1.1 301
Location: http://localhost:3000/login
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly; Expires=Thu, 01 Jan 1970 00:00:00 GMT

So much YAY!

Wrapping up

Today we walked through refactoring our Web server.

We added more capabilities, making it a complete web server delivering a login, home page and logout features.

In the next steps we'll explore the triad of Web: HTML, CSS and JS.