Skip to content

Instantly share code, notes, and snippets.

@staylorx
Created October 3, 2024 17:46
Show Gist options
  • Select an option

  • Save staylorx/f53ef2331fc6a61af068c96ca61ed9ce to your computer and use it in GitHub Desktop.

Select an option

Save staylorx/f53ef2331fc6a61af068c96ca61ed9ce to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Binary Search Algorithm\n",
"\n",
"The purpose of this notebook is to explain the binary search algorithm in \n",
"detail, showing how to prove correctness, termination and deriving the running time. These are meant to be a companion to the lecture that explains the main idea behind the algorithm.\n",
"\n",
"\n",
"Given a _sorted_ list (assume ascending order sorted) of $n$ elements, we wish to find out if a given element `elt` belongs to the list. \n",
"\n",
"Binary search algorithm repeatedly narrows down the possible location of the element by comparing the middle element in the search range to the one we are searching for. We are hoping to find at each step using the fact that the array is sorted.\n",
"\n",
"**TODO** Look at the lecture on binary search before you proceed further.\n",
"\n",
"\n",
"We wish to implement a function `binarySearchHelper(lst, elt, left, right)` wherein:\n",
" - `lst` is a non empty list with at least 2 or more elements.\n",
" - `elt` is the element whose index we are searching for.\n",
" - `left` and `right` represent the \"bounds\" in terms of indices of the list.\n",
" - Note that indices in python start at 0 and go until `len(lst)`. \n",
" - Let us use `n` to denote the length of the list.\n",
" - We will always ensure that 0 <= `left` <= `right` <= n-1. \n",
" \n",
"The expected output is a number `index` or the python value `None`.\n",
" - If a number `index` is returned, it is a valid index of the list between left and right AND `lst[index] == elt` must hold.\n",
" - Otherwise, `None` is returned if and only if the list `lst` does not contain `elt`.\n",
" \n",
"Here is the implementation of `binarySearchHelper`."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"def binarySearchHelper(lst, elt, left, right):\n",
" n = len(lst)\n",
" if (left > right):\n",
" return None # Search region is empty -- let us bail.\n",
" else: \n",
" # If elt exists in the list, it must be between left and right indices.\n",
" mid = (left + right)//2 # Note that // is integer division \n",
" if lst[mid] == elt: \n",
" return mid # BINGO -- we found it. Return its index and that we found it\n",
" elif lst[mid] < elt: \n",
" return binarySearchHelper(lst, elt, mid+1, right)\n",
" else: # lst[mid] > elt\n",
" return binarySearchHelper(lst, elt, left, mid-1)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"def binarySearch(lst, elt):\n",
" n = len(lst)\n",
" if (elt < lst[0] or elt > lst[n-1]):\n",
" return None\n",
" else: # Note: we will only get here if\n",
" # lst[0] <= elt <= lst[n-1]\n",
" return binarySearchHelper(lst, elt, 0, n-1)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Searching for 9 in list [0,2,3,4,6,9,12]\n",
"5\n",
"Searching for 8 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\n",
"4\n",
"Searching for 5 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\n",
"None\n",
"Searching for 0 in list [0,2]\n",
"0\n",
"Searching for 1 in list [0,2]\n",
"None\n",
"Searching for 2 in list [0,2]\n",
"1\n",
"Searching for 1 in list [1]\n",
"0\n",
"Searching for 2 in list [1]\n",
"None\n"
]
}
],
"source": [
"print(\"Searching for 9 in list [0,2,3,4,6,9,12]\")\n",
"print(binarySearch([0,2,3,4,6,9,12], 9))\n",
"\n",
"print(\"Searching for 8 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\")\n",
"print(binarySearch([1, 3, 4, 6, 8, 9,10, 11, 12, 15], 8))\n",
"\n",
"print(\"Searching for 5 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\")\n",
"print(binarySearch([1, 3, 4, 6, 8, 9,10, 11, 12, 15], 5))\n",
"\n",
"print(\"Searching for 0 in list [0,2]\")\n",
"print(binarySearch([0,2], 0))\n",
"\n",
"print(\"Searching for 1 in list [0,2]\")\n",
"print(binarySearch([0,2], 1))\n",
"\n",
"print(\"Searching for 2 in list [0,2]\")\n",
"print(binarySearch([0,2], 2))\n",
"\n",
"print(\"Searching for 1 in list [1]\")\n",
"print(binarySearch([1], 1))\n",
"\n",
"print(\"Searching for 2 in list [1]\")\n",
"print(binarySearch([1], 2))\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementing Using Loops"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"def binSearch(lst, elt):\n",
" n = len(lst)\n",
" if (elt < lst[0] or elt > lst[n-1]):\n",
" return None\n",
" else:\n",
" left = 0\n",
" right = n - 1\n",
" while (left <= right):\n",
" # Exact same logic as the recursion.\n",
" mid = (left + right)//2 # Note that in python3 and above // is integer division \n",
" if lst[mid] == elt: \n",
" return mid # BINGO -- we found it. Return its index and that we found it\n",
" elif lst[mid] < elt: \n",
" left = mid + 1\n",
" else: # lst[mid] > elt\n",
" right = mid - 1 \n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Searching for 9 in list [0,2,3,4,6,9,12]\n",
"5\n",
"Searching for 8 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\n",
"4\n",
"Searching for 5 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\n",
"None\n",
"Searching for 0 in list [0,2]\n",
"0\n",
"Searching for 1 in list [0,2]\n",
"None\n",
"Searching for 2 in list [0,2]\n",
"1\n"
]
}
],
"source": [
"print(\"Searching for 9 in list [0,2,3,4,6,9,12]\")\n",
"print(binSearch([0,2,3,4,6,9,12], 9))\n",
"\n",
"print(\"Searching for 8 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\")\n",
"print(binSearch([1, 3, 4, 6, 8, 9,10, 11, 12, 15], 8))\n",
"\n",
"print(\"Searching for 5 in list [1, 3, 4, 6, 8, 9,10, 11, 12, 15]\")\n",
"print(binSearch([1, 3, 4, 6, 8, 9,10, 11, 12, 15], 5))\n",
"\n",
"print(\"Searching for 0 in list [0,2]\")\n",
"print(binSearch([0,2], 0))\n",
"\n",
"print(\"Searching for 1 in list [0,2]\")\n",
"print(binSearch([0,2], 1))\n",
"\n",
"print(\"Searching for 2 in list [0,2]\")\n",
"print(binSearch([0,2], 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why does binary search work?\n",
"\n",
"We will provide proof that:\n",
" - If `elt` belongs to `lst` at index `j`, then binary search will return `j`\n",
" - If `elt` does not belong to the `lst`, then binary search will either return `None`.\n",
"\n",
"We will also provide proof that the search terminates.\n",
"\n",
"For convenience, we will reason about the recursive version.\n",
"\n",
"#### Claim 1: For any call to the function `binarySearchHelper(lst, elt, left, right)` if `elt` belongs to the list at index `j`, then `left <= j <= right`.\n",
" \n",
"**Proof:** The proof is by induction on calls to `binarySearchHelper` function.\n",
"\n",
"**Base Case:** The very first call to `binarySearchHelper` made from the function `binarySearch` satisfies these properties. Since at the very beginning, we have `left = 0` andd `right = n-1`. The statement is trivially true: if `elt` belongs to the list, then its index must be between `0` and `n-1`.\n",
"\n",
"**Induction:** If a given call to `binarySearchHelper` satisfies these facts then the subsequent call must also satisfy these properties.\n",
"\n",
"To prove this let us take a careful look at the body of the function in question.\n",
"```python\n",
"mid = (left + right)//2 # Note that // is integer division \n",
"if lst[mid] == elt: \n",
" return mid \n",
"elif lst[mid] < elt: \n",
" return binarySearchHelper(lst, elt, mid+1, right) ## CALL 1\n",
"else: # lst[mid] > elt\n",
" return binarySearchHelper(lst, elt, left, mid-1) ## CALL 2\n",
"```\n",
"\n",
"Note that there are two calls back to `binarySearchHelper` labeled `CALL 1` andd `CALL 2` in the body.\n",
" - For CALL 1, note that it can happen only if `lst[mid] < elt`. Since `lst` is sorted, if `elt` were to be found in the list, it can only be found in the range of indices `[mid+1, right]`. Therefore, the property continues to hold for CALL 1.\n",
" - For CALL 2, note that it can happen only if `lst[mid] > elt`. Therefore, if the `elt` were to be found in the list, it can only be found in the range of indices `[left, mid-1]`. Therefore, the property continues to hold for CALL 2."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Definition** the \"size\" of the search region for binary search as given by `(right - left + 1)`.\n",
"\n",
"**Claim 2:** For any call to the function `binarySearchHelper(lst, elt, left, right)`, either we (a) terminate by finding `elt`, (b) conclude that `elt` does not exit in the `lst` or (c) make a call back to `binarySearchHelper` with a strictly smaller search region.\n",
"\n",
"**Proof** The proof of this claim is straight forward from the code.\n",
"```python\n",
"mid = (left + right)//2 # Note that // is integer division \n",
"if lst[mid] == elt: \n",
" return mid \n",
"elif lst[mid] < elt: \n",
" return binarySearchHelper(lst, elt, mid+1, right) ## CALL 1\n",
"else: # lst[mid] > elt\n",
" return binarySearchHelper(lst, elt, left, mid-1) ## CALL 2\n",
"```\n",
"\n",
"\n",
"Note that for CALL 1, the new search region size is \n",
"`right - (mid+1) +1` which is the same as `right - mid`. Note,\n",
"`mid >= left` or in other words, `mid > left - 1`.\n",
"Therefore, the new search region size is `right - mid < right - (left -1) = right -left + 1`. Thus, the new call's search region is strictly smaller than the original call's search region.\n",
"\n",
"\n",
"Note that for CALL 2, the new search region size is `mid - left`. Once again since `mid <= right`, we note that `mid < right + 1`. Therefore, `mid - left < right + 1 - left`. Thus, the new call's search region is strictly smaller than the original call's search region.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overall Correctness Argument\n",
"\n",
"- Whenever a call to `binarySearchHelper(lst, elt, left, right)` is made, we conclude that if `elt` were to be found in `lst` then it would belong to the range `[left, right]`. This is **claim 1**.\n",
"- Thus, if `left > right`, the range of indices is empty, and we can conclude that the `elt` cannot belong to the list.\n",
"- Next, the search region for any successive call is strictly smaller than the previous search region.\n",
"- Therefore, eventually, we have to terminate either by finding the `elt` or reaching the condition `left > right`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Termination\n",
"\n",
"Claim 2 directly proves termination."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Worst Case Execution Time\n",
"\n",
"**Claim 3:** Let us consider a call to `binarySearchHelper(lst, elt, l, r)` and a subsequent call `binarySearchHelper(lst, elt, l1, r1)`. The new search region `r1 - l1 + 1` is at most half the size of the previous search region `r - l + 1`. Formally, \n",
" $$ r1 - l1 + 1 \\leq \\frac{(r - l + 1)}{2} $$\n",
"\n",
"\n",
"**Proof ** \n",
"\n",
"Consider the code for `binarySearchHelper(lst, elt, l, r)` (Note that we write `l`, `r` instead of `left` and `right`. Similarly, we write `m` instead of `mid`).\n",
"```python\n",
"m = (left + right)//2 # Note that // is integer division \n",
"if lst[m] == elt: \n",
" return mid \n",
"elif lst[m] < elt: \n",
" return binarySearchHelper(lst, elt, m+1, r) ## CALL 1\n",
"else: # lst[mid] > elt\n",
" return binarySearchHelper(lst, elt, l, m-1) ## CALL 2\n",
"```\n",
"\n",
"Any subsequent call is either CALL 1 or CALL 2.\n",
"\n",
" - For CALL 1, the new search region size is \n",
" $$ r- (m+1) +1 = r - m = r - \\left\\lfloor \\frac{(l + r)}{2} \\right\\rfloor \\leq r - \\frac{(l+r-1)}{2} \\leq \\frac{(r - l + 1)}{2} $$ \n",
" - For CALL 2, the new search region size is \n",
" $$ (m -1) - l + 1 = m - l = \\left\\lfloor \\frac{(l + r)}{2} \\right\\rfloor - l \\leq \\frac{(l+r)}{2} - l \\leq \\frac{(r - l)}{2} \\leq \\frac{(r-l + 1)}{2} $$.\n",
" \n",
"In both cases, we derive the relation that new search region is less than or equal to half the size of the old search region."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Complexity Analysis**\n",
"\n",
"Thus, the initial search region size is $n$, the size of the list. At each subsequent call, the search region shrinks by half of the previous search region.\n",
"\n",
"Therefore, if we made $k$ iterations of `binarySearchHelper`, the search region would be at most $ \\frac{n}{2^k}$. \n",
"\n",
"When the search region size is less than $1$, we have to stop since we would reach the condition `left < right`.\n",
"\n",
"In the worst case therefore, binary search can run for `k` steps as long as $ \\frac{n}{2^k} \\geq 1 $.\n",
"\n",
"On other words, $2^k \\leq n$, i.e, $k \\leq \\log_2(n)$.\n",
"\n",
"Each recursive call involves constant number of operations. Thus, we concluded that the running time is bounded from above by $O(\\log(n))$.\n",
"\n",
"A similar analysis shows that for every $n$, we can produce a list of size $n$ and a missing element such that the algorithm must take time proportional to $\\log_2(n)$ to run. This lets us conclude that the running time must be $\\Omega(\\log(n))$ in the worst case.\n",
"\n",
"Combining, we get that the running time is $\\Theta(\\log(n))$."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment