Recently, one interesting problem in CF has caught my attention. Main idea is that given {-1, 1}-array it is required to compute variety of all possible sums on subarrays in linear time. Is that possible? And how Kadane’s algorithm is involved here? In this article we’ll try to find out.
Problem formulation
Actual formulation is a bit more complicated.
Given array a consisting of 1 and -1. Then index k and integer number M are chosen and in array we modify a[k]=M. It’s required to find all distinct subarray sums in linear time.
How much distinct sum can we get
Let’s start from the first question: why it should be O(n) possible sums among n(n+1)/2 subarrays?
It may be hard to say why first, let’s make one step back and simplify the question. Let’s not modify anything, how much sums we can get in {-1, 1}-array without any other values in?
Such question seems already solvable. We know, that in {-1, 1}-array x there’s minimal subarray sum greater or equal #(number of -1) and maximal is less or equal #(number of 1). So, there couldn’t be more then x.length + 1 (including 0) distinct sums in x.
How to compute all possible sums
Notice also that because there’s only -1 or 1 in this array, subarray sums change very slightly (-1 or +1 exactly) if we move border index on one position. Since there exists step by step transformation between subarrays with minimal sum s and maximal sum t, there’s a path between s and t with +1 and -1 on every “edge”. On this path we can’t miss any intermediate number. Now, it is clear that list all of possible sums should include minimal and maximal sums and any value in between for any {-1, 1}-array.
The problem is we can’t compute minimal and maximal sums in O(n), yet.
Kadane’s algorithm
… You can find this version anywhere (e.g. in wikipedia):
def max_subarray_kadane(numbers):
best_sum = float('-inf')
current_sum = 0
for x in numbers:
current_sum = max(x, current_sum + x)
best_sum = max(best_sum, current_sum)
return best_sum
Variable current_sum contains best sum ending on current position. Obviously, the answer is maximum taken by all possible endings. Let’s design the general version of the algorithm, both for minimum and maximum (also we want to return position of best subarray):
def opt_subarray_kadane(numbers, mode='max'):
current_sum, sgn = 0, {'max': 1, 'min': -1}
l, r, best_l, best_r = -1, -1, -1, -1
best_sum = sgn[mode] * float('-inf')
for i in range(0, len(numbers)):
x, r = numbers[i], i
if sgn[mode] * (x - (current_sum + x)) >= 0:
current_sum, l = x, i
else:
current_sum = current_sum + x
if sgn[mode] * (best_sum - current_sum) < 0:
best_sum = current_sum
best_l, best_r = l, r
return best_sum, best_l, best_r
Decomposition
Now, when we can solve simplified version of the problem we are ready to solve original version.
Last thing to note: there’s actually two cases for every subarray: it includes index k or not.
In the second case, we are ready to write down the answer: Index k divides array on two parts, for each of them we can run Kadane’s algorithm and get full list of sums as it was described in previous chapter.
How to get possible sums, in case when subarray includes k? It seems that there’s still O(n^2) of such subarrays, however it can be proved that there’s O(n) distinct values among them. Indeed, let’s remove element a[k] from each subarray a[i:j], i <= k < j. This way, now we have a set of subarrays of {-1, 1}-array (a without k-th element). There are O(n) distinct sums for this case as we proved before.
Now we need to learn how to maximize subarrays with one index fixed inside.
Idea how to fix index
Every subarray of a which contains a[k] consists of three parts: left part a[i:k], element a[k] itself and right part a[k+1:j].
Suppose we want to find maximal sum, then we can do the following. Compute prefix sums for a for O(n). Then let’s find a[i:k] with maximal sum and a[k+1, j] with maximal sum also for O(n).
The answer is the sum of max(sum(a[i:k])) + a[k] + max(sum(a[k+1, j])), where maximums were taken by left border i and right border j.
By the way, if we iterate over each index as fixed, we get yet another O(n^2) algorithm for maximal/minimal subarray, still one order better than bruteforce solution.
And that’s it, following the same logic for minimum we can find interval of possible for all subarrays. Don’t forget to add intervals for left and right intervals, which can be computed with simple kadane’s algorithm.
Instead of conclusion
You can find sketch solution here.