### Online Programming Contest, 1-2 January 2005

Solution to Problem 2: What is the Rank?

```You are given a list of integers a_1, ... a_N, and your aim is
to determine the rank of a_i among a_1,...,a_i.

Let us dispose off the direct algorithm. For each i, simply scan
from a_i down to a_1 and determine the rank of a_i by counting
the number of elements below (or above) a_i.  This takes time
proportional to i. Thus, summing up over all i, this algorithm
takes roughly N*N steps.  This will get you 30% of the marks.

On the face of it, this problems seems similar to the "End of
Corruption" problem, where we had a stream of numbers coming in
and we had to insert the numbers into a list. We were also
required to find the current maximum element in the list and
delete it.

Recall, from the discussion in that problem, that we could
maintain the list either in sorted order or in the order in which
the numbers arrive.  Maintaining the list in sorted order would
require us to spend time proportional to the length of the list
in order to insert an element into it. On the other hand, if we
choose to maintain the numbers in the order in which they arrive,
then, each insertion cost us only one instruction.

If the list is maintained in arbitrary order, then finding the
rank of an element will take take proportional to the length of
the list. The naive algorthim described above uses this idea and
cannot be improved.

On the other hand if the list is maintained in sorted order, then
we can find the rank of an element in time proportional to log(n)
where n is the number of elements in the list. Here is how:

Suppose a_1 > a_2 > a_3 > ... a_n and the next element is x.
(Recall from the statment of the problem the x is different from
a_1 ... a_n.)

If x > a_1 (or x < a_n) then its rank is 1 (is n+1).  Otherwise,
compare x with the "middle element" of the list.  The middle
element is a_m where m = n/2 (/ denotes integer division).

1) If x > a_m then it suffices to find the rank of x in the
list a_1 ... a_{m-1}.

2) If x < a_m and x > a_{m+1} then rank of x is m+1.

3) If x < a_{m+1} then it suffices to find the rank of x in
the list a_{m+1} ... a_n and add m to it.

Notice that after making the necessary comparisons required by 1,
2 and 3, we are left with a list whose length is < n/2.  In
finding the rank of x in this smaller list we will apply the same
ideas above, i.e. compare it with the middle element and then
restrict the search to a list whose length < (n/2)/2 = n/4 and so
on.  This repeated halving can go on for at most log(n)
steps. The details are as follows:

We now formulate a recursive algorithm motivated by the above
idea.

----------------------------------------------------------------

/*
The numbers are available in the array A.
The number of elements in the array is n.
The array A is in descending order
*/

/*
findrank takes 3 arguments, x,l and r
where 1 ≤ l ≤ r ≤ n and returns the rank
of x in the list A[l], A[l+1], ... A[r]
*/

findrank(x,l,r) {

if (x > A[l]) return(1)
if (x < A[r]) return(r-l+2)

mid = (l + r)/2

if (x > A[mid]) return(findrank(x,l,mid-1))  /* 1 above
else if (x > A[mid+1]) return(mid-l+1)       /* 2 above
else return(mid+findrank(x,mid+1,r))    /* 3 above

}

print(findrank(x,1,n))

-----------------------------------------------------------

As mentioned earlier if there are n elements in the list
(i.e. between l and r) when findrank is called, in any recursive
call there are at most n/2 elements. Thus, there are at most
log(n) (i.e. the smallest k such that 2^k is more than n)
recursive calls before the list is empty. Each call, carries out
a fixed number, say k instructions and thus the total number of
instructions executed is log(n)*k which is roughly log(n).

Exercises:

1) Write down a non-recursive version of this algorithm.

2) If we are only interested in the rank of x in the entire array
(and do not require that findrank(x,l,r) return the rank of x
among the elements A[l],..., A[r], then, we can simply the
above function considerably. How?

So what do we have so far:

1) Maintain the list in the order in which the numbers arrive.
Each insert takes only a contant number of steps but finding
the rank takes time proportional to the length of the
list. Thus, if there are N elements in all we would time
proportional to N*1 + N*N (the first term is contributed by
the inserts and the second one is the time taken in finding
the ranks). Thus the time taken is roughly N*N.

2) Maintain the array in sorted order. Each insert takes time
proportional to the length of the list.  Thus inserting N
element takes time proportional to N*N. Finding the rank of an
element takes time proportional to log(n) where n is the
current length of the list. So the overall time time taken to
find the rank of the N elements (as they are inserted) is not
more than N*log(N). N*log(N) + N*N is roughly N*N (since
log(N) is much smaller than N).

So, there is no real difference between the two. But, if we
learnt our lessons (from the "End of Corruption" problem) right,
we would like to balance the efficiency of the inserts in the
first algorithm with the efficiency of the rank finding in the
second algorithm. And this is how you do it:

We use Sqrt(N) lists, each containing at most Sqrt(N) numbers,
and each of these lists is maintained in sorted order.

To insert an element, we look for the first list with space (that
will take time proportional to Sqrt(N)) and then insert it into
the list at the appropriate place (to maintain the sorted order)
and this take time proportional to Sqrt(N). Thus each insert
takes time proportional to Sqrt(N).

To find the rank of an element, we find its rank in each of the
Sqrt(N) lists.  That takes time proportional to Sqrt(N) *
log(Sqrt(N)).  Suppose, the rank in the Sqrt(N) lists are
r_1,r_2,...,r_Sqrt(N). Then, the overall rank is (r_1 - 1) + (r_2
- 1) + ... + (r_Sqrt(N) - 1) + 1. (Since there are (r_i - 1)
elements in list i bigger than the given number.)  Thus the rank
can be found in time proportional to Sqrt(N)*log(Sqrt(N)).

So the overall time of inserting and finding the current rank of
N elements is proportional to N*(Sqrt(N) + log(Sqrt(N))). Note
that log(N) = 2 * log(Sqrt(N)). Thus, this is roughly
N*Sqrt(N)*log(N). Since log(N) is much smaller than Sqrt(N) (for
example if N is 40000, Sqrt(N) is 200 and log(N) is 16) this
algorithm performs significantly better than one that takes time
proportional to N*N. This algorithm would get you 100% marks.

Here is the full algorithm. We use a 2-dimensional array of size
Sqrt(N) X Sqrt(N). Each row stores a list. We also keep an array
Next[] of size Sqrt(N) to keep track of the first vacant position
in each of the Sqrt(N) lists.

---------------------------------------------------------------------

integer array  A[Sqrt(N)][Sqrt(N)]
integer array  Next[Sqrt(N)]

for i=1 to Sqrt(N) {
Next[i] = 1
}

for i=1 to N  {
y =  Rank(x)
Insert(x)
print(y)
}

Insert(x) {

Find the first i such that Next[i] ≤ Sqrt(N). /* Find a list with
space */

Find the first j such that A[i][j] ≤ x.  /* Have to insert x at
(If no such j is found return Next[i])              position j */

for (k=Next[i] down to j+1)           /* create space to insert x */
A[i][k]=A[i][k-1]

A[i][j] = x                        /* insert x */
Next[i] = Next[i] + 1

}

/*
A function that finds the rank of the element x among the
elements between positions l and r in row i of the Array A
(The same as the findrank above except that instead of working
over a one dimensional array A it works over the row i of
a two dimensional array A)
*/

findrowrank(i,x,l,r) {

if (x > A[i][l]) return(1)
if (x < A[i][r]) return(r-l+2)

mid = (l + r)/2

if (x > A[i][mid]) return(findrowrank(i,x,l,mid-1))
else if (x > A[i][mid+1]) return(mid-l+1)
else return(mid+findrowrank(i,x,mid+1,r))
}

Rank(x) {
rank = 0

for (i=1 to Sqrt(N))    { /* add up the number of elements below
x in all the rows */
rank = rank + findrowrank(i,x,1,Next[i]-1) - 1

}
/* rank has the number of elements below x in the entire
two dimensional array now. So the Rank of x is rank+1 */

return(rank+1)

}

--------------------------------------------------------------------

It turns out that one do a lot better than what is described here
using a very specialised data structure. One can do the N inserts
and find the ranks in overall time proportional to
N*log(N). Thus, for N = 40000, that algorithm would be about 200
(Sqrt(40000)) times faster!! We postpone a discussion on this
data structure for the moment.

We would like to add now that the algorithm for finding the rank
described above can be modified to find out whether x is a
already present in sorted array of size N in time proportional to
log(N).  This technique is called "binary search" and we describe
that algorithm below for the sake of completeness (you are
encouraged to work it out for yourselves and not read on!).

Binary Search:
--------------

Let A be a array with N elements sorted in ASCENDING order.
Given x, we would like to know if x appears in A.   (The array
could contain duplicates and of course x could appear in A.)

The naive algorithm would scan the array  and would take time
proportional to the length of the array.

The binary search algorithm works very much like the "findrank"
described above.

-----------------------------------------------------------------

/*
The algorithm bsearch takes 3 inputs, x, l and r.
It returns 0 if x does not appear among A[l] ... A[r].
Otherwise, it returns an index j, l ≤ j ≤ r such that
A[j] = x
*/

int bsearch(int x, int l, int r) {

int mid;
mid = (l+r)/2;

/* x is smaller than the smallest element in A */
if (x < A[l]) return(0);

/* x is bigger than the largest element in A */
if (x > A[r]) return(0);

/* x is smaller than A[mid]. So it cannot appear among
A[mid],A[mid+1],...A[r] */
if (x < A[mid]) return(bsearch(x,l,mid-1));

/* x is bigger than A[mid] so it cannot appear among
A,A,... A[mid] */
if (x > A[mid]) {
/* if x is also less than A[mid+1] then x does not appear
in A */
if (x < A[mid+1]) return(0);

/* Otherwise, search among A[m+1], ... A[r] */
return(bsearch(x,mid+1,r));
}

/* x equals A[mid] */

return(mid);

}

-----------------------------------------------------------------------

As in the case of findrank, noting that each recursive call
reduces the distance between l and r by half, guarantees that
there are at most Log(N) recursive calls to bsearch and since
each call takes only a constant number of operations the overall
algorithm takes roughly Log(n) steps.

Exercise: Can we use the binary search algorithm to solve the
Average problem (the other problem in the January problem set) in
N*N*log(N) steps (needless to add, without using the ideas