# Quicksort, a Ruby implementation using Test-driven development

## Implementing Quicksort using TDD

Quicksort is a sorting algorithm designed to have a *reasonable time complexity*.

In average, most implementations of this algorithm are capable of coming to *O(n log(n))*. It is implemented by many programming languages and platforms, being one of the most important algorithms for sorting.

## Implementation

Its implementation relies on a divide-and-conquer algorithm, where it works by selecting a "pivot" element from the list, then partitioning the list into two sub-arrays: the *elements that are smaller than pivot* (smaller ones); and the *elements that are larger than pivot* (larger ones), which are then sorted recursively).

### Understanding the algorithm

Although it's especially important to understand this algorithm, its implementation is not that *hard* to realize.

Observe the following pseudo-code:

```
list = [3, 1, 2]
# it's arbitrary to choose the pivot, however the choice may
# impact on the final time complexity. In this guide, we'll
# choose the first element for didactic purposes
pivot = 3
remaining = [1, 2]
smaller, larger = partition(pivot, remaining)
-> repeat the process for the smaller list
-> repeat the process for the larger list
```

The above code only cares about partitioning the list into (un)sorted smaller and larger elements than *pivot*. If we repeat the process for each sub-list and **place the pivot in the middle**, we may end up having the following solution:

```
sort(smaller) + pivot + sort(larger)
```

Consider the above explanation just a Quickstart (pun intended) to understand the overall algorithm.

For many, it's not straightforward to understand recursion, which is why this guide will try to dissecate the internals of Quicksort using Test-driven development, a great technique and good practice that I use on my daily-basis, mainly when I'm trying to experiment and understand the *unknown*.

## Baby steps to the rescue

When doing TDD, we are encouraged to work in small cycles, or *baby-steps*, in order to fullfil the **red-green-refactor** iteration.

Such technique can help us to understand the Quicksort algorithm. Due to its nature of *divide-and-conquer*, it's a perfect fit for breaking down the steps through the TDD cycle.

First of all, we write a file named `quicksort_test.rb`

, which will have our bootstrap code:

```
require 'test/unit'
class QuicksortTest < Test::Unit::Case
def test_sorting_empty_list
end
end
```

This dummy code will start by sorting an empty array. Run `ruby quicksort_test.rb`

. The output should be as follows:

```
Finished in 0.0004595 seconds.
-------------------------------------------------------------------------------
1 tests, 0 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
-------------------------------------------------------------------------------
2176.28 tests/s, 0.00 assertions/s
```

For now, we will write the *sort* method in the same file for the sake of simplicity.

```
require 'test/unit'
def sort(list)
[]
end
class QuicksortTest < Test::Unit::TestCase
def test_sorting_empty_list
assert_equal [], sort([])
end
end
```

Run the test again and...YAY!

### Test dummy (but important!) scenarios

Yes, dummy scenarios are important. Don't underestimate them.

```
def test_sorting_one_number_only
assert_equal [42], sort([42])
end
```

```
def sort(list)
[42]
end
```

Pretty cool, uh? And dummy, I know...

Let's sort two numbers:

```
def test_sorting_two_numbers
assert_equal [8, 42], sort([42, 8])
end
```

```
def sort(list)
return [] if list.size == 0 # when the list is empty
return list if list.size == 1 # when the list has one number
[8, 42] # OMG!
end
```

Please, STAHP the *naiveness*!

Ok, let's start to write a more sophisticated test:

```
def test_sorting_multiple numbers
list = [5, 1, 4, 3, 2]
assert_equal [1, 2, 3, 4, 5], sort(list)
end
```

### The initial implementation

Suppose our input is an unsorted list:

```
list = [5, 1, 4, 3, 2]
```

We can start the implementation by choosing the pivot. Already mentioned that, in this guide, we will choose the first element.

But there are reasons where choosing the first element as the pivot is not the best performant scenario. This is for learning purposes only, as you can later adapt the algorithm to use another approach for the pivot.

```
pivot = list[0] # 5
```

Then, we should get the remaining list (all elements but the pivot):

```
remaining = list[1..-1] # [1, 4, 3, 2]
```

### Partition recursively

Next step, think about the *partition*, which takes

- the pivot
- the remaining list
- an array of smaller elements than pivot (starts empty)
- an array of larger elements than pivot (starts empty)
`partition(pivot, remaining, [], [])`

Why are we starting the smaller/larger arrays as empty? Because we will run this method *recursively*.

#### A bit of recursion

Ok, but what's recursion? In short:

```
def greet(name)
puts "Hello #{name}"
greet(name)
end
```

The above code will run forever, taking all of the memory's process, resulting in a stack overflow.

We MUST place a stop condition before the recursive call, otherwise we would never stop it:

```
def greet(name, times = 10)
puts "Hello #{name}"
return if times == 0
# decreasing the variable times on the next iteration
# next, will be 9
# then 8, 7, 6...successively until 0 meets our stop condition
greet(name, times -= 1)
end
```

Now, the following example is how we WANT to call the partition. Hold on, I know the method doesn't yet exist, but we are doing baby-steps. A bit of pseudo-code:

```
unsorted_list = [3, 1]
pivot = 3
remaining = [1]
smaller, larger = partition(pivot, remaining, [], [])
smaller #=> [1]
larger #=> []
```

Let's write the first test for the partition in which there's no remaining numbers. A scenario where all the elements were already partitioned (not yet sorted).

Why we are writing this scenario first? Because we know that, when there's no more elements to partition, we can **stop the recursion** and return the accumulated *smaller* and *larger* sub-lists.

```
def test_partition_no_remaining
pivot = 5
remaining = []
smaller = [3, 1]
larger = [8, 6, 7]
assert_equal [[3, 1], [8, 6, 7]], partition(pivot, remaining, smaller, larger)
end
```

```
def partition(pivot, remaining, smaller, larger)
return [smaller, larger] if remaining == 0
end
```

This first implementation means that, when the remaining list is empty, we can safely return the sub-arrays and stop the recursion.

Partition of two numbers:

```
def test_partition_two_numbers
pivot = 1
remaining = [5]
assert_equal [[], [5]], partition(pivot, remaining, [], [])
end
```

Here, the remaining list is **not** empty, so we should think on "what is a partition".

We can think of a partition being a method/function that takes a pivot, a list and accumulated smaller/larger arrays related to pivot.

How can we do this recursively?

- reducing the list
- accumulating the smaller/larger arrays

It's not an easy task to write a test that detach the logic of a recursive piece of code. But we can **isolate** the logic that is responsible to reduce and accumulate, and place it to another method.

So in the `partition`

method, we'd have a separate call to the `next_partition`

before the recursive call:

```
def partition(pivot, list, smaller = [], larger = [])
return [smaller, larger] if list.size == 0
# Here, the method `next_partition` will return the reduced
# list (tail|remaining), the "next" smaller (accumulated) and the "next" larger (accumulated)
tail, next_smaller, next_larger = next_partition(pivot, list, smaller, larger)
# Now, as we already have the stop condition in the first
# line, we can safely call this method recursively
partition(pivot, tail, next_smaller, next_larger)
end
```

Time to dissecate the `next_partition`

before running the test.

Let's then write a test scenario for the `next_partition`

method:

```
#=> next_partition(pivot, tail, smaller, larger)
list = [3, 5, 1, 4, 2]
pivot = 3
remaining = [5, 1, 4, 2]
smaller = []
larger = []
tail, next_smaller, next_larger = next_partition(pivot, remaining, smaller, larger)
assert_equal [1, 4, 2], tail
assert_equal [], next_smaller
assert_equal [5], next_larger
```

It may be confused at first but we can interpret this test as being:

- the unsorted list is [3, 5, 1, 4, 2]
- we chose the pivot as being 3, the first element
- excluding the pivot, the remaining numbers are [5, 1, 4, 2]
- smaller starts empty
- larger starts empty

So, for the next iteration, we have the following data:

- reduced list (tail of remaining) is [1, 4, 2]
- accumulated (next) smaller ones than pivot (3) is []
- accumulated (next) larger ones than pivot (3) is [5]

#### Golden tip

Let's think: the `next_iteration`

**should compare** the pivot with the first element of remaining list, and *decide if* the element should be added to the accumulated smaller or larger arrays!

Let's keep on our iterations, one-by-one (manual recursion):

```
next_tail = [1, 4, 2]
pivot = 3 # still our first pivot
next_smaller = []
next_larger = [5]
next_tail, next_smaller, next_larger = next_partition(pivot, next_tail, next_smaller, next_larger)
assert_equal [4, 2], next_tail
assert_equal [1], next_smaller
assert_equal [5], next_larger
## Next (final) iteration
next_tail = [4]
pivot = 3 # still
next_smaller = [1, 2]
next_larger = [5]
next_tail, next_smaller, next_larger = next_partition(pivot, next_tail, next_smaller, next_larger)
assert_equal [], next_tail
assert_equal [1, 2], next_smaller
assert_equal [4, 5], next_larger
```

Our recursion would stop here because there's no "next tail". All of the initial list was reduced.

In case we had unsorted smaller/larger arrays, we would repeat the process all over again for each sub-array.

Now, we could concatenate the results according to our initial sort logic:

```
# Partition result
* smaller: [1, 2]
* pivot: 3
* larger: [4, 5]
# Sorted = sort(smaller) + [pivot] + sort(larger)
[1, 2] + [3] + [4, 5]
```

YAY!

```
[1, 2, 3, 4, 5]
```

But we don't have the code yet. Looking at this pseudo-code examples, now it's easy to think on the `next_partition`

logic:

```
def next_partition(pivot, list, smaller, larger)
head = list[0]
tail = list[1..-1]
if head <= pivot
[tail, [head] + smaller, larger]
else
[tail, smaller, [head] + larger]
end
end
```

We can test more partition scenarios:

```
def test_partition_three_numbers
assert_equal [[], [6, 8]], partition(3, [8, 6])
end
```

And, finally, sorting multiple numbers

```
def test_sorting_multiple_numbers
assert_equal [1, 2, 3, 4, 5], sort([5, 3, 1, 2, 4])
assert_equal [1, 2, 3, 4, 5], sort([3, 5, 1, 4, 2])
assert_equal [1, 1, 3, 5, 8], sort([3, 1, 8, 1, 5])
end
```

### Important note about performance

The examples above do not apply the “in-place” rule of Quicksort.

Performance may be degraded in time complexity, but the main point here is to demonstrate the overall logic of Quicksort.

## Conclusion

The final code can be seen in this Gist.

In this guide we covered the fundamentals of the Quicksort algorithm by using test-driven development. We also did learn a bit about recursion and how to test recursion (manually or not) by isolating the methof/function responsible for reducing/accumulating for the next iteration.

I hope it could be helpful somehow. Feel free to drop a message or comment.

Follow me @leandronsp in dev.to, twitter and Linkedin.