Most efficient way to continuously find the median of a stream of numbers (in Python)?

I’m trying to solve a problem which reads as follows:

A queue of eager single-digits (your input) are waiting to enter an empty room.

I allow one digit (from the left) to enter the room each minute.

Each time a new digit enters the room I chalk up the median of all the digits currently in the room on the chalkboard. [The median is the middle digit when the digits are arranged in ascending order.]. If there are two median numbers (i.e. two middles) then rather than using the average, I chalk up the lower one of the two.

I chalk new digits to the right of existing digits so my chalkboard number keeps getting longer.

What number ends up on your chalkboard once all the digits are in the room?

Consider the example input: 21423814127333

  • 2 (the leftmost) is allowed into the room where it is the only digit so I write 2 on the chalkboard.
  • 1 is then allowed into the room to join 2. The smaller one of these two is used as the median so I chalk up 1 to the right of 2 on the chalkboard (my number is now 21)
  • 4 now enters the room. The median of 1, 2 and 4 is 2 so I add 2 to my chalkboard (my number is now 212)
  • …this process continues until the final 3 enters the room … all the numbers are in the room now which, when sorted, are 1,1,1,2,2,2,3,3,3,3,4,7,8,8. There are two median digits but they are both 3 so I add 3 to my chalkboard and my final number is 21222222222233

My current solution:

num = input()
new = str(num[0])
whole = [num[0]]

for i in range(1, len(num)):
    new += whole[i//2]


The problem is that it takes too long – so it passes 6/10 (hidden) test cases but exceeds the time limit for the other 4. Any help would be greatly appreciated.

>Solution :

You are repeatedly sorting,
with key comparison,
so total cost is O(N * N log N),
that is, it is at least quadratic.

single-digits (your input) are waiting to enter

The key to this problem is the range limit on input.
We know that each input x is in this range:

0 <= x < 10

Use counters.
We can easily allocate ten of them.

Keep a running count of total number of digits that have been
admitted to the room.
Each time you have to report a median, compute
of ordered counters, stopping when you get
to half the total count.

max_val = 10
c = {i: 0  for i in range(max_val)}
assert 0 <= input_val < max_val

c[input_val] += 1

cum_sum = 0
for i in range(max_val):
    cum_sum += c[i]

Since median is a robust statistic,
typically there will be some stability
in the median you report, e.g. "2, 1, 2, 2, 2, 2".
You can use that to speed the computation
still further, by incrementally computing the
cumulative sum.
It won’t change the big-Oh complexity, though,
since there’s a constant number of counters.
We’re still looking at O(N), since we have to
examine each of the N digits admitted to the room.

Leave a Reply