Indian Computing Olympiad

Training Material


Range queries

Segment tree

Another useful data structure for range queries with point updates is the segment tree. This can also be used for more generalized situations, such as when values are updated in an interval rather than at just one point.

Given an array A, we store the values in the leaves of a tree. At each level we combine two nodes at time and store their sum in the parent. This builds up a binary tree, as shown in the following example with an array of size 8.

                    ----27-----
                  /             \
                10               17
               /   \            /   \
             5       5       10       7
           /  \     /  \     / \     / \
A[i] :    2    3   3    2   6   4   5   2

  i  :    1    2   3    4   5   6   7   8
      

Now, suppose we update A[7] from 5 to 8. To update the tree, we start from the leaf A[7] and walk up a path of length log N to the root, updating all values on the way. In the figure below, the updated values are marked with a *.

                    ----30*----
                  /             \
                10               27*
               /   \            /   \
             5       5       10      10*
           /  \     /  \     / \     / \
A[i] :    2    3   3    2   6   4   8*  2

  i  :    1    2   3    4   5   6   7   8
      

Given this data structure, how do we find the prefix sum P[i]?

At each node, keep track of what segment it stores.

                         -----30----
                       /    [1,8]    \
                     /                 \
                   10                    27
                 [1,4]                  [5,8]
               /       \              /       \
             5           5          10          10
           [1,2]       [3,4]       [5,6]       [7,8]
           /   \       /   \       /   \       /  \
A[i] :    2     3     3     2     6     4     8     2
        [1,1] [2,2] [3,3] [4,4] [5,5] [6,6] [7,7] [8,8]

  i  :    1     2     3     4     5     6     7     8
      

Suppose we want the value P[6]. We start at the root and observe that [1-6] is contained in [1-8]. Going left we find [1-4] that is contained in [1-6] but not all of it. What is left is [5-6] for which we go right. Seeing [5-8], we go left and find [5-6] and stop.

What if we want P[7]? Again, we pick up [1-4] and go to [5-8]. We then pick up [5-6] and go right to pick up [7-7].

In general, when computing P[j], we pick up the value of the left child of j and, if any part of the segment [1-j] is still remaining, we go right. At each step we pick up the entire left child and go right. This can take at most log N steps.

How do we store this tree? We can number the elements in the tree level by level, starting from the root, as we do in a heap. If there are N leaves, there are 2N-1 elements in the array, with the leaves in the last N positions. Given a position i, we can compute the positions parent(i), leftchild(i) and rightchild(i) as in a heap.

To set up the array, do something similar to constructing a heap. The array elements are stored in the last N elements. Working backwards, update A[i] as A[2i]+A[2i+1] (sum of its children).

As we have seen before, given the prefix sums, we can compute sum(i,j) for any segment A[i]..A[j] by the expression P[j]-P[i].

Suppose we want, instead, to compute the minimum value for arbitrary segments.

Again we build the tree by keeping the minimum of each interval.

                    -----2-----
                  /             \
                 2                2
               /   \            /   \
             2       2        4       2
           /  \     /  \     / \     / \
A[i] :    2    3   3    2   6   4   5   2
  i  :    1    2   3    4   5   6   7   8
      

If we update a value, we propagate the update up to the root as before. For example, suppose we reset A[5] to 1.

                    -----1*----
                  /             \
                 2                1*
               /   \            /   \
             2       2        1*      2
           /  \     /  \     / \     / \
A[i] :    2    3   3    2   1*  4   5   2
  i  :    1    2   3    4   5   6   7   8

How do we find min(i,j) for an arbitray interval A[i]...A[j]?

We start at the root and split (i,j) to the part that goes left and right.

For instance, to compute min(2,5), we split the interval [2..5] as [2..4] and [5..5] and send the first search left and the second one right. On the left, we further split [2..4] as [2..2] and [3..4] etc.

Why does this work in log n time?

We have an overall interval [1..n], say [1..8] and an interval [i..j] to decompose. If [i..j] is entirely contained in [1..n/2] or [n/2+1..n], we don't have to break it up at the root.

The interesting case is when [i..j] spans the midpoint. Then, the interval splits as [i..n/2] and [n/2+1..j]. Notice that now we have an interval in each subtree such that one endpoint is the end of the overall interval in that subtree (the interval we are search for "sticks" to the right end or the left end of the overall interval).

Now, if we have a node with an interval [k..l] and we are looking for an interval [k..m] sticking to its left, we either do not branch at all (if m < midpoint) or we split, but the left path stops at the next level.

For instance, initially [2,6] splits at [2,4] and [5,6]. 4 is the right end point of [2,4] and 5 is the left endpoint of [5,6]. When [2,4] next splits, it becomes [2,2] and [3,4]. Here, [3,4] is a complete interval and need not be further decomposed: just pick up the value at the internal node [3,4].

Hence, computing min(i,j) takes only O(log N) time.