Skip to main content

The sqrt thesis of mixing strategies

· 20 min read
Xinyu Ma
Research Scientist @ Meta

When solving a problem, it is common that we have two different strategies that fit in different cases. For example, one algorithm may have a better time complexity but uses more memory than the other. Or, one is fast when there are only a few of large objects, but the other works better when there are many small objects. However, I found that there usually exists a solution which is a naive mixture of the two strategies, and its performance will be the sqrt of the two.

Introduction

It is common that we have tow different strategies solving a problem.

One example is that we want a data structure which supports 2 operations, and we have two candidates:

OperationCandidate 1Candidate 2
Operation 1SlowFast
Operation 2FastSlow

And we have both operation 1 and 2, so we want a data structure which is not so slow in either case.

Another example is that the scale of input has two arguments: MM and NN. One algorithm is fast when MM is large, like O(MN3)O(MN^3); another is the reverse, suitable to the case when NN is large, like O(M3N)O(M^3N). But we don't know the number of MM and NN before it runs, so we want a balanced algorithm.

Sometimes we want to balance two different indicators, like time complexity and memory complexity.

A naive way to get a balanced algorithm is mixing two candidates together. In my undergraduate years, I observed that there always exists such a mixture, and the complexity of it is always the sqrt of two. I call it the sqrt thesis. Since its statement is too vague (without definitions of "strategies" and "mixture"), I have no way to prove it. Neither do I know if someone else has already discovered it. But I can share the examples I observed where this thesis works.

Examples

The problems listed here are somehow famous among algorithm competitors. No one knows to whom we should give credit. Therefore, I think it's safe to omit the authors.

Unrolled Linked List

There are so many "versions" or "names" of this simple data structure: unrolled linked list, chunked array, STL rope, etc. The problem is trivial: we want to maintain a linear list which supports insertion and random query. More formally,

  • Data: Objects (= empty interface)
  • Operation: Random Query
    • input: an integer ii
    • output: the object at position ii
  • Operation: Insertion
    • input: an integer ii and an object AA
    • output: a list, where ii-th object is AA. All objects before ii remained untouched; objects after ii is shifted right by 11 place.
  • Restrictions: size of list = number of operations = NN.

It is obvious that array and linked list are two candidates. Complexities are:

OperationArrayLinked ListUnrolled Linked List
Random QueryO(1)O(1)O(N)O(N)O(N)O\left(\sqrt{N}\right)
InsertionO(N)O(N)O(1)O(1)O(N)O\left(\sqrt{N}\right)

Observe that array is easy to index but hard to move. Thus, a naive mixture is using linked list to store small pieces of arrays:

  • Each element of a linked list is a "block", i.e. an array of length BB, s.t. 12NLN\frac{1}{2}\sqrt{N}\leq L \leq\sqrt{N}.
  • The length of linked list is LL, where NB2N\sqrt{N}\leq B \leq2\sqrt{N}.
  • When query, go over a block once a time, and jump to the element when we meet the block containing ii-th element.
  • When insert, copy the block containing ii and insert AA into it.
    • If the block oversize, split it into two.
  • Easy to show that both operation has complexity O(N)O(\sqrt{N}).

Unrolling linked list works better in general cases. For example, when half of operations are queries and the other half are insertions, both array and linked list degenerate to O(N2)O(N^2), but unrolled linked list works in O(NN)O(N\sqrt{N}).

Three Range Problems

Give a fixed list of integers, every time query the three averages (mean, median and mode) of a given range. It is easy to figure out the mean problem suffices to sum and the median suffices to KK-th least.

  • Data: Integers. (Use xix_i denote ii-th integer).
  • Operation: Query
    • input: two indices LL, RR s.t. 0L<R<N0\leq L < R < N; (a number K<RLK<R-L)
    • output: a number, which is the sum (K-th smallest or mode) of all numbers xix_i where LiRL\leq i\leq R.
  • Operation: Modification
    • input: one index ii and a number vv
    • output: a modified list, where the ii-th element is changed to vv, as xi=vx_i=v.
  • Restrictions: size of list = NN; number of query = QQ.

Sum

Two naive methods are array and prefix sum:

  • Array stores every element individually.
  • Prefix sum stores all cumulative sums k=0ixk\sum_{k=0}^{i}{x_k}. We can get every range sum by a substraction, but we need to change O(N)O(N) elements each time we modify one element.
OperationArrayPrefix SumUnrolled Linked List
Range QueryO(N)O(N)O(1)O(1)O(N)O\left(\sqrt{N}\right)
ModificationO(1)O(1)O(N)O(N)O(N)O\left(\sqrt{N}\right)

So, the naive mixture is applying the idea of unrolling linked list:

  • Store the sum of every block k=jsj(s+1)xk\sum_{k=js}^{j(s+1)}{x_k}, with s=Ns=\sqrt{N}.
  • When querying, collect the sum of every block covered entirely (N\sqrt{N}), and add up elements in the block which is not covered entirely (2N2*\sqrt{N}).
  • When modifying, change the element as well as the sum of the block it belongs to. So we get an algorithm which works in O(1)O(1) for modification and O(N)O(N) for query. A good news is that it even works for ranged addition, i.e. add kk to every number xix_i between LL and RR. This can be handled by assigning a label for every block, meaning the number added to the entire block.

A better algorithm is segment tree which works O(logN)O(\log{N}) for both, but it is not a naive mixture.

Mode

It will be very difficult, though solvable, if we consider modifications, so let's ignore it.

The first strategy is always the brute force one, which takes O(N2)O(N^2) for a query.

The second strategy is to preprocess all numbers' occurrence in all ranges. Similar to the idea of prefix sum, it is enough to maintain an array ff containing prefix frequencies of each number:

fi,j=k=0i1=(xk,xj).f_{i,j} = \sum_{k=0}^{i}{\mathbb{1}_{=}(x_k, x_j)} .

Then, we can get the mode of every range by DP: let mi,jm_{i,j} denote the mode among xi,,xjx_i,\ldots,x_j; then mi,j+1m_{i,j+1} is either mi,jm_{i,j} or xj+1x_{j+1}, which can be decided in O(1)O(1). After all, we can handle each query in O(1)O(1). However, this needs O(N2)O(N^2) for the two-step initialization.

Now it's the time for chunked array. Let fi,jf'_{i,j} denote the prefix frequencies of each number, but at a coarse granularity — ii indicates a block. Let mi,jm'_{i,j} denote the mode from ii-th to jj-th block. Now, given a query, we have some (<N<\sqrt{N}) blocks and some (<2N<2\sqrt{N}) uncovered elements. The result can be either the mode of the range — we need to get its correct frequency by going over those uncovered elements; or an uncovered element — we need to enumerate each such number and count its occurrence.

This may be a little difficult to describe verbally. The pseudo-code looks like this:

lblock, rblock = start_pos_of_block(l, r)
# chain means to concatenate two ranges
for i in chain(range(l, lblock), range(rblock, r)):
cnt[x[i]] = 0
m = m[lblock, rblock]
cnt[m] = 0

candidates = [x[i] for i in chain(range(l, lblock), range(rblock, r))]
candidates.append(m)
candidates = unique(candidates)

for i in chain(range(l, lblock), range(rblock, r)):
cnt[x[i]] += 1
for v in candidates:
# Actually we may need to +1/-1 with some indices
cnt[v] += f[rblock, v] - f[lblock, v]
# We pick the one with maximum cnt[y] for y in candidates

Hence, we can handle the query in O(N)O(\sqrt{N}). The initialization seems to be O(N2)O(N^2), but can be O(NN)O(N\sqrt{N}) if we implement it with care. Thus, we get a mixed algorithm with sqrt init time and sqrt query time.

OperationArrayPrefix FreqChunked Array
Ranged QueryO(N)O(N)O(1)O(1)O(N)O(\sqrt{N})
InitializationO(1)O(1)O(N2)O(N^2)O(NN)O(N\sqrt{N})

Since mode is not statistically mergable, segment tree does not work here. Also, the best complexity considering modification seems to be O(N53)O(N^\frac{5}{3}). (I'm not sure)

K-th smallest

I'm not sure if there is a sqrt algorithm for this. Leader Mo's Algorithm contains an extra O(logN)O(\log N) The "standard" solutions are persistent segment tree, splitting & merging tree, and divide and conqour, which are all log2N\log^2{N} algorithms.

Leader Mo's Algorithm

If we are allowed to handle queries offline, then a lot of problems have a sqrt solution. The original thinking, proposed by Leader Tao Mo, comes from the fact that we can calculate the minimal spanning tree in Manhattan distance in O(NlogN)O(N\log{N}), and the size of this tree is bounded by O(NN)O(N\sqrt{N}). Here I want to provide a different path towards this algorithm.

Let's use the range mode question as an example. Consider the two cases:

  • All ranges are small RiLi<TR_i-L_i < T. Then, the complexity of brute force algorithm is QTQT, which may be acceptable if TT is really small.
  • All ranges are large RiLi>TR_i-L_i > T, with TT near NN. Then, the distances between ranges LjLi+RjRi|L_j-L_i|+|R_j-R_i| is small. Thus, we can move from a range to another, fixing the counts of numbers. The complexity of this algorithm is the sum of all distances, N+Q(NT)N+Q(N-T).

Looking at NN versus NTN-T, you might think the mixing strategy will fall into something like 12N\frac{1}{2}N. However, the good news is that we can still get O(min(T(NT)))=O(N)O(\min(\sqrt{T(N-T)})) = O(\sqrt{N}). Just like how we group elements into blocks in previous examples, this time we group queries:

  • Queries with in a group are near (i.e. the pairwise distances are short).
  • The number of groups is small (so we do not need to count from scratch many times)

Assume we have GG groups and the distance within a group is DD, then

  • For each group, we need to calculate the count of each number from scratch once, which is O(GN)O(GN) totally.
  • We wander from one query to another within a group, which takes O(QD)O(QD).

Assuming QQ has the same order as NN, the ideal case would be D=G=ND=G=\sqrt{N}. But how can we achieve this? Note that we can sort the queries in any order, for example, in ascending order by RR. Then the right boundary will be monotonically increasing, and the real distance — that we need to go back and forth -- is made up with the LL part only. Therefore, we can do as follows:

  1. Group queries by the left boundary, LL, into N\sqrt{N} groups.
  2. With in a group, sort the queries in ascending order by the right boundary, RR.
  3. For each group, we start from l=r=(i+1)Nl=r=(i+1)\sqrt{N}, which is the rightmost element of group ii. Initialize an array containing the count of all numbers. They are all 0 at this time.
  4. Assume the next query in the group is [Lj,Rj][L_j, R_j]. Note that we always have l=(i+1)Nl=(i+1)\sqrt{N} at this step. We first move rr to RjR_j and count all numbers in the path.
  5. Move ll to LjL_j and count all numbers in the path. Now we have the result for [Lj,Rj][L_j, R_j].
  6. Move ll back to the left point of the group, (i+1)N(i+1)\sqrt{N}.
  7. If we have more queries in group ii, go back to step 4.
  8. If we have more groups, go back to step 3.

We need to move the right endpoint rr up to NN for each group, and the left endpoint upto N\sqrt{N} for each query. Thus, the total complexity is O((N+Q)N)O((N+Q)\sqrt{N}).

Leader Mo's Algorithm is really a brilliant idea and can be applied to lots of scenarios. However, the two major constraints are:

  • Queries are offline.
  • Ability to not only count new elements, but also undo this operation. (We need to move ll back to the starting position at step 6)

Triangle Counting

Give an undirected graph, we want to count how many triangles are in this graph.

  • Input: an undirected graph G=(V,E)G=(V,E) of V=N\|V\| = N nodes and E=M\|E\| = M edges.
  • Output: the number of triangles in this graph.
    • A triangle is a tuple of three nodes (x,y,z)V3(x,y,z)\in V^3 s.t. there is an edge between any two nodes {{x,y},{y,z},{x,z}}E\{\{x,y\},\{y,z\},\{x,z\}\}\subseteq E.
  • Restrictions: NN and MM are of the same order.

By the way, before we start looking at it, I want to correct a general misunderstanding that "we need O(N2)O(N^2) to enumerate two adjacent nodes". The fact is when edges are given in the form of adjacent list, enumerating nodes are equivalent to enumerating edges. A similar (and wider) misunderstanding happens when we apply this to a kind of convolutional DP on a rooted tree:

# Sorry I forget the exact problem ...
def dfs(self)
self.dp = [1 for i in range(N)]
self.dp[0] = 1
for c in self.children:
c.dfs()
tmp = [0 for i in range(N)]
for i in range(N):
for j in range(i):
tmp[i] += self.dp[j] * c.dp[i-j]
if self.dp[j] == 0:
break
self.dp = tmp

This seems to be O(N4)O(N^4), but it's actually O(N3)O(N^3), as we only go over NN for each pair of ancestor-descendant relation.

Go back to the problem. Let's forget the fact that the result is the trace of cubed adjacent matrix over six. A naive solution, that we enumerate an edge {x,y}\{x, y\} and then enumerate a third node adjacent to yy, takes O(MN)O(MN), which is the number of edges times the sum of all degrees. (If you have question on how we can decide whether zz is adjacent to xx in O(1)O(1), try to implement it yourself.)

Observe that

  1. We meet each triangle 3 times.
  2. The time used depends on how many nodes yy is adjacent to.
  3. However, the total number of edges is MM, of the same order as NN. (i.e. sparse) This fact indicates that there is a chance to optimize the latter NN in the time complexity.
  4. Can we solve 2 by taking the node with lower degree as yy?

Let's analyze how the NN comes. Some nodes have a large degree >D>D, the others are small. The number of those nodes with a large degree is at most L=2MDL = \frac{2M}{D}. We have at most min(12L2,M)\min(\frac{1}{2}L^2, M) edges among those "large" nodes. Thus, we need O(L3)O(L^3) or O(ML)O(ML) to enumerate triangles among those nodes.

For those triangles which has a "small" node, we assume the "small" node is yy. Then, the complexity is O(MD)O(MD).

Finally, by L=2MDL = \frac{2M}{D} we have O(ML+MD)O(MM)O(ML+MD) \leq O(M\sqrt{M}), which shows that our strategy in 4, always start from the smaller end of an edge, actually works.

In this example, we implicitly mix up two cases, where they meet at a sqrt point. In the following example, we have to do this explicitly.

Rectangles Counting

Given a set of points with integer coordinates on a plane, we want to know how many axis-aligned rectangles can they form.

  • Input: an set of NN points, with their coordinates (xi,yi)(x_i, y_i) are integers. No overlapped points.
  • Output: the number of axis-aligned rectangles.
    • An axis-aligned square is made up with four points with their coordinates in the form of {(x,y),(x+a,y),(x,y+b),(x+a,y+b)}\{(x, y), (x+a, y), (x, y+b), (x+a,y+b)\}.

First, replace coordinate numbers with their ranks, which enables us to operate on the range, such as to allocate arrays subscripted by values. Then, sort those points by lexical order, so we have several columns, sets of points with the same x-coordinate.

Method 1: Enumerate Column Pairs

We can enumerate pairs of columns, and count rectangles made up with points in each column pairs. Assume column ii and jj overlap at Oi,jO_{i,j} rows, then the number of rectangles is (Oi,j2){O_{i,j} \choose 2}. There are many ways to calculate Oi,jO_{i, j}:

  • Use a meet-in-the-middle algorithm, sliding two indices on two columns and count how many times they overlap.
  • Use an array markmark subscripted by range. For each column ii, set mark[y]mark[y] to be true iff point (i,y)(i, y) exists in column jj. The for each other column jj, enumerate points (j,y)(j, y') in it and see how many times mark[y]mark[y'] hits.

If we have CC columns, the complexity is O(NC)O(NC).

Method 2: Enumerate Points per Row

Another way is to forget about combinations and list points one by one, keeping an eye on how many rectangles a point can form with points examined before.

  1. Iterate over points in lexical order. Use pi=(xi,yi)p_i = (x_i, y_i) to denote current one.
  2. Iterate points before pip_i but in the same row: pj=(xj,yi)p_j = (x_j, y_i), xj<xix_j < x_i.
  3. Count how many points in the same row with pip_i hit xjx_j. That is, how many pk=(xi,yk)p_k = (x_i, y_k) s.t. yk<yiy_k < y_i, where (xj,yk)(x_j, y_k) exists. This can be done O(1)O(1) if we use an array cntcnt to keep counting.
  4. Since pip_i hits xjx_j, we need to increase cnt[xj]cnt[x_j] by 1.
  5. Every time after we go over points in a whole column in step 1, we need to clear cntcnt.

The complexity depends on the average size of a row. Let S=NRS = \frac{N}{R} denote it and we end up with an O(NS)O(NS) algorithm.

Mix them up

SS seems to be the inverse of RR, which is irrelevant with CC. However, we can exchange two coordinates before for method 2. Then, we have the complexity O(NC)O(NC) versus O(NNC)O(N\frac{N}{C}). Now, if the sqrt thesis applies here, we should get an algorithm of O(NCNNC)=O(NN)O(\sqrt{NC\cdot N\frac{N}{C}})=O(N\sqrt{N})

The idea is actually quite straight forward. Similarly to the last question, observe that:

  • The large columns, whose S>NS' > \sqrt{N}, are not many, C<O(N)C' < O(\sqrt{N}).
  • Most columns C>O(N)C'' > O(\sqrt{N}) are small S<NS'' < \sqrt{N}.

Thus, we can just separate large and small columns and apply the two algorithms respectively:

  • Large-large column: apply method 1.
  • Small-small column: apply method 2.
  • Large-small column: we can apply either. Method 1 is easier. (If you have questions, try yourself and you will know why)

Two Index Weighted Ranking

This is a problem I proposed for a programming contest in my undergraduate university. This time, we do have a sqrt algorithm, but it does not perfectly follow the formula.

Suppose our boss ask us to write an algorithm for a mini search engine. In the frontend, the user input was decomposed to some keywords with weights, which are inputs to our algorithm. In the database, we already have a set of webpages; each of them is assigned with some keywords and a likeliness score to each keyword. The output of the algorithm is the search result — a list of webpages sorted by the weighted average of likeliness scores.

He wants the algorithm to be super fast, but he only gives us a single core machine, which makes it impossible. However, we know his unit tests are very weak: all queries have two keywords; queries on the same dataset share same weights. More specifically,

  • There are NN webpages and KK keywords.
  • The likeliness score of website ii for keyword jj is Li,jL_{i,j}, a real number between 00 and 11.
    • The likeliness relation is sparse, so only C=O(N)C=O(N) numbers are non-zero.
  • There are QQ queries.
    • Each query has exact 2 keywords aa and bb, with their weights xx and yy respectively, and an index KK.
    • We should output the KK-th website in the list sorted by Wi=xLi,a+yLi,bW_i = x L_{i,a} + y L_{i,b}.
    • For each dataset, all queries share xx and yy.
    • Queries are given offline.
  • Restrictions: N,K<104N,K<10^4, Q<105Q<10^5, C<3×104C<3 \times 10^4.

Now, can we hack here — passing all test cases in time without sorting those pages?

Analysis

xx and yy do not make the game more difficult than two 0.50.5, if they are always the same. Also, C<3NC < 3N is not so different from that every website has exactly 3 keywords.

Method 1: nth-element

If we do sort, then the complexity is O(QNlogN)O(QN\log N) nth-element does not save much time than sorting, we have O(QN)O(QN) at the worst case, which won't pass.

Method 2: sort every pair of keywords

We need O(KClogN)O(KC\log N) to sort every pair of keywords, but after that we can query in O(Q)O(Q).

Mix

Look at the KCKC part of the complexity. We have K2K^2 pairs of keywords, and at most NN webpages for each keyword. Here, KNKN becomes a CC because likeliness is sparse. However, if we do sort every query, which is equivalent to only processing those keywords pairs occurring in queries, then we will have QNQN. We cannot reduce QNQN to something smaller. This is because not every keyword has few pages.

Recall the idea we used before and apply here:

  • Some keywords are large, some are small
  • If we handle them differently, we can have sqrt(KCQN)\operatorname{sqrt}({KCQN}) by the sqrt thesis.
    • An extra logN\log N is needed for something like sort or BST.
  • If we are lucky, KNKN in the former formula can be reduced to another CC and we get CQC\sqrt{Q}

Well, though the story does not go exactly as we planned, we can still get a sqrt algorithm. It works like this:

  • Build a BST of pages for each keyword.
  • Sort all queries by (a,b)(a, b)
  • When we meet a new (a,b)(a, b), we insert pages of the smaller keyword into the bigger keyword's BST.
  • After all (a,b)(a, b) queries, remove those inserted elements.

The question is how many webpage-keyword relation we need to insert. The worse case seems to be CQCQ, however, we only have KK keywords and C=O(K)C=O(K) relations. Thus, we have at most C\sqrt{C} "big" keywords, and "small" keywords have at most C\sqrt{C} pages. Thus, the number of insertions is CQ\sqrt{C}Q

Variant

One variant is to allow modifications — change the likeliness score Li,jL_{i,j} to an input number. This forces us using online algorithms.

This variant can be solved as follows:

  • Let G=CG=\sqrt{C}. A keyword is called big if it contains more than GG pages.
  • Build BSTs for not only single keywords but also pairs of big keywords, in O(CGlogN)O(CG\log{N})
  • When we deal with a query
    • If it is between big keywords, look it up in O(logN)O(\log{N})
    • Otherwise, insert small keyword's pages into big keyword's BST, in O(GlogN)O(G\log{N})
  • When we deal with an insertion, we need to update at most GG BSTs.
  • Thus, the total time complexity is O((C+Q)GlogN)O((C+Q)G\log{N}).

Conclusion

No matter the thesis is precise or not, the trick of balancing a trade-off with a sqrt works in a wide range of areas.

References?

  • Alon, N., Yuster, R. & Zwick, U. Finding and counting given length cycles. Algorithmica 17, 209–223 (1997). link.
    • Someone said the origin of the triangle enumerating algorithm is this paper. I haven't read it.
  • Anonymous. Stars in Your Window II. Peking University Programming Contest #15, Problem J (2015). link. solution.
    • The rectangle counting problem.
    • My solution code does not follow this post exactly and thus is unnecessarily complicated and slow.