Sorting data structures is one of the most
frequently used operations in computer science…. Operations such as sorting a
sequence of words in alphabetical order, or sorting a sequence of numbers in
increasing order are just basic examples that illustrate how vast the purposes
of sorting could be.
We have seen some algorithms that allow the
implementation of this process successfully so we will discuss the differences
between them.
When we are sorting, the first question we
should ask ourselves is: What is the most efficient algorithm? In other words,
what is the algorithm that will return the sorted data structure by executing
the least possible number of steps?
Every function can be bounded above by
another function. This upper bound is called Big-Oh. It shows a bound on the
worst case running time for a given function.
First of all, we will talk about the
sorting algorithms that are typically executed more inefficiently. These are
bubble sort, selection sort and insertion sort. They are bounded above by O(n^2). I will assume
that if you are reading this entry, you probably have a sense of how these
algorithms are implemented. So let’s talk about why they are usually the
slowest implementations of a sorting operation.
- Bubble sort
makes n-1 comparisons during the first pass on a list containing n elements.
Then, we will reduce by 1 the number of comparisons evaluated by the function
for each pass. This is why it’s such an inefficient algorithm (it makes
something less than n-1 comparisons n number of times…); the elements are not
placed in their final location when they are manipulated (except for the
largest one in the unsorted part of the list). The passes keep taking place until the list is
finally sorted. However, we should point out that using bubble sort is the most
efficient way to know if a list is already sorted since it won’t swap any item
on the first pass and we will know that the structure is sorted! (which is
O(n)!)
- Selection sort
is also a very inefficient algorithm. We will need to traverse the whole list
n-1 number of times to find the largest (or smallest) item and put it in its
right location. Again, we can see intuitively that this is O(n^2).
- Insertion sort also takes n-1 passes but it
only needs to compare the element being evaluated to the sorted part of the
list (which has less than n elements). This is also O(n^2). But note that in
the best case (already sorted list), each item only needs one comparison so we
will get O(n)!
Now we will talk about the algorithms using
a divide and conquer strategy. We say “divide and conquer” because we divide
the list being passed into smaller sublists in order to improve the performance
of the program. At the end, we will “conquer” or bring these pieces together. This picture shows how the process works:
- The merge sort
algorithm recursively splits the given list in half. Then we place the smallest
items of each sublist in the right location of a bigger list. At the end we
will get the sorted list. The “dividing” step of this algorithm is O(log n)
because we get smaller sublists each time the recursive step is applied. The
“conquering” or “merging” step evaluates each item once. So it’s O(n). Putting
these pieces together we can see that the merging algorithm is O(n log n),
which is considerably faster than the algorithms we talked about previously.
- Quick sort is usually faster than merge
sort. However, it’s remarkably slower on its worst-case run-time. If the
designed pivot value in each pass is always either the biggest or smallest
element in the list, then the function will execute one whole pass to locate a
single element in the right location each time. This, is O(n^2), we compare n-1
items n times.
We should also make the point about
Timsort. This is Python’s built-in sorting algorithm. It's typically O(n log n) and performs faster than any of
the algorithms previously stated (as we’ve seen in lab 10). This video will
clarify the operating process of this algorithm: https://www.youtube.com/watch?v=jVXsjswWo44
Summarizing, we’ve seen which sorting
algorithms are typically the fastest and which ones are the slowest. But again,
we should always consider what is it exactly that we want to do with our
program and what is the input that we are going to pass to it since sometimes
an algorithm that is usually slow could be really efficient in a specific
context and vice-versa.
Bibliography:
picture 1:
http://www.wikihow.com/images/2/26/Sort-a-Deck-of-Cards-Step-1.jpg
picture 2: picture: http://www.cise.ufl.edu/research/ParallelPatterns/PatternLanguage/AlgorithmStructure/Figures/d-and-c.gif
video: research task by Mark Lee.
Bibliography:
picture 1:
http://www.wikihow.com/images/2/26/Sort-a-Deck-of-Cards-Step-1.jpg
picture 2: picture: http://www.cise.ufl.edu/research/ParallelPatterns/PatternLanguage/AlgorithmStructure/Figures/d-and-c.gif
video: research task by Mark Lee.