How Fast Do Algorithms Improve? - MIT Initiative on the Digital ...

10
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. How Fast Do Algorithms Improve? By YASH SHERRY MIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USA NEIL C. THOMPSON MIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USA MIT Initiative on the Digital Economy, Cambridge, MA 02142 USA A lgorithms determine which calculations computers use to solve problems and are one of the central pillars of computer science. As algorithms improve, they enable scientists to tackle larger problems and explore new domains and new scientific tech- niques [1], [2]. Bold claims have been made about the pace of algorithmic progress. For example, the President’s Council of Advisors on Science and Technology (PCAST), a body of senior scientists that advise the U.S. President, wrote in 2010 that “in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed” [3]. However, this conclusion was supported based on data from progress in linear solvers [4], which is just a single example. With no guarantee that linear solvers are representative of algorithms in general, it is unclear how broadly conclusions, such as PCAST’s, This article has supplementary downloadable material available at https://doi.org/10.1109/JPROC.2021.3107219, provided by the authors. Digital Object Identifier 10.1109/JPROC.2021.3107219 should be interpreted. Is progress faster in most algorithms? Just some? How much on average? A variety of research has quantified progress for partic- ular algorithms, including for maximum flow [5], Boolean satisfiability and factoring [6], and (many times) for linear solvers [4], [6], [7]. Others in academia [6], [8]–[10] and the private sector [11], [12] have looked at progress on bench- marks, such as computer chess ratings or weather prediction, that is not strictly comparable to algorithms since they lack either mathematically defined problem statements or verifiably optimal answers. Thus, despite substan- tial interest in the question, existing research provides only a limited, fragmentary view of algorithm progress. In this article, we provide the first comprehensive analysis of algorithm progress ever assem- bled. This allows us to look systematically at when algo- rithms were discovered, how they have improved, and how the scale of these improvements compares to other sources of innovation. Analyzing data from 57 textbooks and more than 1137 research papers reveals enormous variation. Around half This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ PROCEEDINGS OF THE IEEE 1

Transcript of How Fast Do Algorithms Improve? - MIT Initiative on the Digital ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

How Fast Do AlgorithmsImprove?By YASH SHERRYMIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USA

NEIL C. THOMPSONMIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USAMIT Initiative on the Digital Economy, Cambridge, MA 02142 USA

A lgorithms determine which calculations computers use to solveproblems and are one of the central pillars of computer science.As algorithms improve, they enable scientists to tackle largerproblems and explore new domains and new scientific tech-

niques [1], [2]. Bold claims have been made about the pace of algorithmicprogress. For example, the President’s Council of Advisors on Science andTechnology (PCAST), a body of senior scientists that advise the U.S. President,wrote in 2010 that “in many areas, performance gains due to improvements inalgorithms have vastly exceeded even the dramatic performance gains due toincreased processor speed” [3]. However, this conclusion was supported basedon data from progress in linear solvers [4], which is just a single example. Withno guarantee that linear solvers are representative of algorithms in general, it isunclear how broadly conclusions, such as PCAST’s,

This article has supplementary downloadable material available athttps://doi.org/10.1109/JPROC.2021.3107219, provided by the authors.

Digital Object Identifier 10.1109/JPROC.2021.3107219

should be interpreted.Is progress faster in mostalgorithms? Just some? Howmuch on average?

A variety of research hasquantified progress for partic-ular algorithms, including formaximum flow [5], Booleansatisfiability and factoring [6],and (many times) for linearsolvers [4], [6], [7]. Others inacademia [6], [8]–[10] and theprivate sector [11], [12] havelooked at progress on bench-marks, such as computer chessratings or weather prediction,that is not strictly comparable toalgorithms since they lack eithermathematically defined problemstatements or verifiably optimalanswers. Thus, despite substan-tial interest in the question,existing research provides onlya limited, fragmentary view ofalgorithm progress.

In this article, we provide thefirst comprehensive analysis ofalgorithm progress ever assem-bled. This allows us to looksystematically at when algo-rithms were discovered, howthey have improved, and howthe scale of these improvementscompares to other sources ofinnovation. Analyzing data from57 textbooks and more than1137 research papers revealsenormous variation. Around half

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

PROCEEDINGS OF THE IEEE 1

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

of all algorithm families experiencelittle or no improvement. At theother extreme, 14% experiencetransformative improvements, rad-ically changing how and wherethey can be used. Overall, we findthat, for moderate-sized problems,30%-43% of algorithmic families hadimprovements comparable or greaterthan those that users experiencedfrom Moore’s Law and other hardwareadvances. Thus, this article presentsthe first systematic, quantitativeevidence that algorithms are oneof the most important sources ofimprovement in computing.

I. R E S U L T SIn the following analysis, we focus onexact algorithms with exact solutions.That is, cases where a problem state-ment can be met exactly (e.g., find theshortest path between two nodes on agraph) and where there is a guaranteethat the optimal solution will be found(e.g., that the shortest path has beenidentified).

We categorize algorithms intoalgorithm families, by which wemean that they solve the same under-lying problem. For example, MergeSort and Bubble sort are two of the18 algorithms in the “ComparisonSorting” family. In theory, an infi-nite number of such families couldbe created, for example, by subdi-viding existing domains so that spe-cial cases can be addressed sepa-rately (e.g., matrix multiplication witha sparseness guarantee). In general,we exclude such special cases fromour analysis since they do not repre-sent an asymptotic improvement forthe full problem.1

To focus on consequential algo-rithms, we limit our consideration tothose families where the authors of atextbook, one of 57 that we examined,considered that family important

1The only exception that we make to thisrule is if the authors of our source text-books deemed these special cases importantenough to discuss separately, e.g., the distinc-tion between comparison and noncomparisonsorting. In these cases, we consider each as itsown algorithm family.

enough to discuss. Based on theseinclusion criteria, there are 113 algo-rithm families. On average, there areeight algorithms per family. We con-sider an algorithm as an improvementif it reduces the worst case asymp-totic time complexity of its algorithmfamily. Based on this criterion, thereare 276 initial algorithms and sub-sequent improvements, an averageof 1.44 improvements after the initialalgorithm in each algorithm family.

A. Creating New Algorithms

Fig. 1 summarizes algorithmdiscovery and improvement overtime. Fig. 1(a) shows the timing forwhen the first algorithm in each fam-ily appeared, often as a brute-forceimplementation (straightforward,but computationally inefficient),and Fig. 1(b) shows the share ofalgorithms in each decade whereasymptotic time complexity improved.For example, in the 1970s, 23 newalgorithm families were discovered,and 34% of all the previouslydiscovered algorithm families wereimproved upon. In later decades,these rates of discovery and improve-ment fell, indicating a slowdown inprogress on these types of algorithms.It is unclear exactly what causedthis. One possibility is that somealgorithms were already theoreticallyoptimal, so further progress wasimpossible. Another possibility is thatthere are decreasing marginal returnsto algorithmic innovation [5] becausethe easy-to-catch innovations havealready been “fished-out” [13] andwhat remains is more difficult tofind or provides smaller gains. Theincreased importance of approximatealgorithms may also be an explanationif approximate algorithms havedrawn away researcher attention(although the causality could alsorun in the opposite direction, withslower algorithmic progress pushingresearchers into approximation) [14].

Fig. 1(c) and (d), respectively,show the distribution of “time com-plexity classes” for algorithms whenthey were first discovered, and theprobabilities that algorithms in one

class transition into another becauseof an algorithmic improvement. Timecomplexity classes, as defined inalgorithm theory, categorize algo-rithms by the number of operationsthat they require (typically expressedas a function of input size) [15].For example, a time complexity ofO(n2) indicates that, as the size ofthe input n grows, there exists afunction Cn2 (for some value of C)that upper bounds the number ofoperations required.2 Asymptotic timeis a useful shorthand for discussingalgorithms because, for a sufficientlylarge value of n, an algorithm witha higher asymptotic complexity willalways require more steps to run.Later, in this article, we show that,in general, little information is lost byour simplification to using asymptoticcomplexity.

Fig. 1(c) shows that, at discovery,31% of algorithm families belongto the exponential complexitycategory (denoted n!|cn)—meaningthat they take at least exponentiallymore operations as input size grows.For these algorithms, includingthe famous “Traveling Salesman”problem, the amount of computationgrows so fast that it is often infeasible(even on a modern computer) tocompute problems of size n = 100.Another 50% of algorithm familiesbegin with polynomial time that isquadratic or higher, while 19% haveasymptotic complexities of n log n orbetter.

Fig. 1(d) shows that there is con-siderable movement of algorithmsbetween complexity classes as algo-rithm designers find more efficientways of implementing them. Forexample, on average from 1940 to2019, algorithms with complexityO(n2) transitioned to complexity O(n)

with a probability of 0.5% per year,as calculated using (3). Of particularnote in Fig. 1(d) are the transitions

2For example, the number of operationsneeded to alphabetically sort a list of 1000 file-names in a computer directory might be0.5(n2 + n), where n is the number of file-names. For simplicity, algorithm designers typ-ically drop the leading constant and any smallerterms to write this as O(n2).

2 PROCEEDINGS OF THE IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

Fig. 1. Algorithm discovery and improvement. (a) Number of new algorithm families discovered each decade. (b) Share of known algorithm

families improved each decade. (c) Asymptotic time complexity class of algorithm families at first discovery. (d) Average yearly probability

that an algorithm in one time complexity class transitions to another (average family complexity improvement). In (c) and (d) “>n3” includes

time complexities that are superpolynomial but subexponential.

from factorial or exponential time(n! | cn) to polynomial times. Theseimprovements can have profoundeffects, making algorithms thatwere previously infeasible for anysignificant-sized problem possible forlarge datasets.

One algorithm family that hasundergone transformative improve-ment is generating optimal binarysearch trees. Naively, this problemtakes exponential time, but, in 1971,Knuth [16] introduced a dynamicprogramming solution using the

properties of weighted edges tobring the time complexity to cubic.Hu and Tucker [17], in the same year,improved on this performance with aquasi-linear [O(n log n)] time solutionusing minimum weighted path length,which remains the best asymptotic

PROCEEDINGS OF THE IEEE 3

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

time complexity achieved for thisfamily.3

B. MeasuringAlgorithm Improvement

Over time, the performance ofan algorithm family improves asnew algorithms are discovered thatsolve the same problem with feweroperations. To measure progress,we focus on discoveries that improveasymptotic complexity—for example,moving from O(n2) to O(n log n),or from O(n2.9) to O(n2.8).

Fig. 2(a) shows the progress overtime for four different algorithmfamilies, each shown in one color.In each case, performance is normal-ized to 1 for the first algorithm inthat family. Whenever an algorithmis discovered with better asymptoticcomplexity, it is represented bya vertical step up. Inspired byLeiserson et al. [5], the height of eachstep is calculated using (1), repre-senting the number of problems thatthe new algorithm could solvein the same amount of time as thefirst algorithm took to solve a singleproblem (in this case, for a problemof size n = 1 million).4 For example,Grenander’s algorithm for the max-imum subarray problem, which isused in genetics (and elsewhere),is an improvement of one million ×over the brute force algorithm.

To provide a reference point forthe magnitude of these rates, the fig-ure also shows the SPECInt bench-mark progress time series compiledin [20], which encapsulates the effectsthat Moore’s law, Dennard scaling,and other hardware improvementshad on chip performance. Throughoutthis article, we use this measure as thehardware progress baseline. Fig. 2(a)shows that, for problem sizes of n =1 million, some algorithms, such asmaximum subarray, have improved

3There are faster algorithms for solvingthis problem, for example, Levcopoulos’s lin-ear solution [18] in 1989 and another linearsolution by Klawe and Mumey [19], but theseare approximate and do not guarantee that theywill deliver the exact right answer and, thus,are not included in this analysis.

much more rapidly than hard-ware/Moore’s law, while others, suchas self-balancing tree creation, havenot. The orders of magnitude ofvariation shown in just these fourof our 113 families make it clearwhy overall algorithm improvementestimates based on small numbersof case studies are unlikely to berepresentative of the field as a whole.

An important contrast betweenalgorithm and hardware improve-ment comes in the predictability ofimprovements. While Moore’s law ledto hardware improvements happeningsmoothly over time, Fig. 2 shows thatalgorithms experience large, but infre-quent improvements (as discussed inmore detail in [5]).

The asymptotic performance of analgorithm is a function of input sizefor the problem. As the input grows,so does the scale of improvement frommoving from one complexity class tothe next. For example, for a problemwith n = 4, an algorithmic changefrom O(n) to O(log n) only repre-sents an improvement of 2 (=4/2),whereas, for n = 16, it is an improve-ment of 4 (=16/4). That is, algorith-mic improvement is more valuable forlarger data. Fig. 2(b) demonstratesthis effect for the “nearest-neighborsearch” family, showing that improve-ment size varies from 15× to ≈4million× when the input size growsfrom 102 to 108.

While Fig. 2 shows the impact ofalgorithmic improvement for fouralgorithm families, Fig. 3 extendsthis analysis to 110 families.5 Insteadof showing the historical plot ofimprovement for each family, Fig. 3presents the average annualizedimprovement rate for problem sizesof one thousand, one million, andone billion and contrasts them withthe average improvement rate inhardware as measured by the SPECIntbenchmark [20].

4For this analysis, we assume that the lead-ing constants are not changing from one algo-rithm to another. We test this hypothesis laterin this article.

5Three of the 113 families are excludedfrom this analysis because the functional formsof improvements are not comparable.

As these graphs show, there are twolarge clusters of algorithm familiesand then some intermediate values.The first cluster, representing justunder half the families, shows littleto no improvement even for largeproblem sizes. These algorithm fam-ilies may be ones that have receivedlittle attention, ones that have alreadyachieved the mathematically optimalimplementations (and, thus, areunable to further improve), those thatremain intractable for problems of thissize, or something else. In any case,these problems have experiencedlittle algorithmic speedup, andthus, improvements, perhaps fromhardware or approximate/heuristicapproaches, would be the mostimportant sources of progress forthese algorithms.

The second cluster of algorithms,consisting of 14% of the families, hasyearly improvement rates greater than1000% per year. These are algorithmsthat are benefited from an expo-nential speed-up, for example, whenthe initial algorithm had exponentialtime complexity, but later improve-ments made the problem solvablein polynomial time.6 As this highimprovement rate makes clear, earlyimplementations of these algorithmswould have been impossibly slow foreven moderate size problems, but thealgorithmic improvement has madelarger data feasible. For these families,algorithm improvement has far out-stripped improvements in computerhardware.

Fig. 3 also shows how largean effect problem size has onthe improvement rate, calculatedusing (2). In particular, for n = 1thousand, only 18% of familieshad improvement rates faster thanhardware, whereas 82% had slowerrates. However, for n = 1 million andn = 1 billion, 30% and 43% improvedfaster than hardware. Correspond-ingly, the median algorithm familyimproved 6% per year for n = 1thousand but 15% per year for n = 1

6One example of this is the matrix chainmultiplication algorithm family.

4 PROCEEDINGS OF THE IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

Fig. 2. Relative performance improvement for algorithm families, as calculated using changes in asymptotic time complexity. The

comparison line is the SPECInt benchmark performance [20]. (a) Historical improvements for four algorithm families compared with the first

algorithm in that family (n = 1 million). (b) Sensitivity of algorithm improvement measures to input size (n) for the “nearest-neighbor

search” algorithm family. To ease comparison of improvement rates over time, in (b) we align the starting periods for the algorithm family

and the hardware benchmark.

million and 28% per year for n = 1 bil-lion. At a problem size of n = 1.06 tril-lion, the median algorithm improvedfaster than hardware performance.

Our results quantify two impor-tant lessons about how algorithm

improvement affects computerscience. First, when an algorithmfamily transitions from exponential topolynomial complexity, it transformsthe tractability of that problem ina way that no amount of hardware

improvement can. Second, as prob-lems increase to billions or trillions ofdata points, algorithmic improvementbecomes substantially more importantthan hardware improvement/Moore’slaw in terms of average yearly

PROCEEDINGS OF THE IEEE 5

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

Fig. 3. Distribution of average yearly improvement rates for 110 algorithm families, as calculated based on asymptotic time complexity,

for problems of size: (a) n = 1 thousand, (b) n = 1 million, and (c) n = 1 billion. The hardware improvement line shows the average yearly

growth rate in SPECInt benchmark performance from 1978 to 2017, as assembled by Hennessy and Patterson [20].

improvement rate. These findingssuggest that algorithmic improvement

has been particularly important inareas, such as data analytics and

machine learning, which have largedatasets.

6 PROCEEDINGS OF THE IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

Fig. 4. Evaluation of the importance of leading constants in algorithm performance improvement. Two measures of the performance

improvement for algorithm families (first versus last algorithm in each family) for n = 1 million. Algorithmic steps include leading constants

in the analysis, whereas asymptotic performance drops them.

C. Algorithmic Step Analysis

Throughout this article, we haveapproximated the number of stepsthat an algorithm needs to performby looking at its asymptotic complex-ity, which drops any leading constantsor smaller-order terms, for example,simplifying 0.5(n2 + n) to n2. For anyreasonable problem sizes, simplifyingto the highest order term is likelyto be a good approximation. How-ever, dropping the leading constantmay be worrisome if complexity classimprovements come with inflation inthe size of the leading constant. Oneparticularly important example of thisis the 1990 Coppersmith–Winogradalgorithm and its successors, which,to the best of our knowledge, haveno actual implementations because“the huge constants involved in thecomplexity of fast matrix multiplica-tion usually make these algorithmsimpractical” [21]. If inflation of lead-ing constants is typical, it wouldmean that our results overestimatethe scale of algorithm improvement.On the other hand, if leading con-stants neither increase nor decrease,on average, then it is safe to ana-lyze algorithms without them sincethey will, on average, cancel out whenratios of algorithms are taken.

To estimate the fidelity of ourasymptotic complexity approxi-mation, we reanalyze algorithmicimprovement, including the leadingconstants (and call this latter toconstruct the algorithmic steps ofthat algorithm). Since only 11% ofthe papers in our database directlyreport the number of algorithmicsteps that their algorithms require,whenever possible, we manuallyreconstruct the number of steps basedon the pseudocode descriptions inthe original papers. For example,Counting Sort [14] has an asymptotictime complexity of O(n), but thepseudocode has four linear for-loops, yielding 4n algorithmic stepsin total. Using this method, we areable to reconstruct the number ofalgorithmic steps needed for thefirst and last algorithms in 65% ofour algorithm families. Fig. 4 showsthe comparison between algorithmstep improvement and asymptoticcomplexity improvement. In eachcase, we show the net effect acrossimprovements in the family by takingthe ratio of the performances of thefirst and final algorithms (kth) in thefamily (steps1)/(stepsk).

Fig. 4 shows that, for the caseswhere the data are available, the size

of improvements to the number ofalgorithmic steps and asymptoticperformance is nearly identical.7

Thus, for the majority of algorithms,there is virtually no systematic infla-tion of leading constants. We cannotassume that this necessarily extrapo-lates to unmeasured algorithms sincehigher complexity may lead to bothhigher leading constants and a lowerlikelihood of quantifying them (e.g.,matrix multiplication). However, thisanalysis reveals that these are theexception, rather than the rule. Thus,asymptotic complexity is an excellentapproximation for understanding howmost algorithms improve.

II. C O N C L U S I O NOur results provide the first systematicreview of progress in algorithms—one of the pillars underpinning com-puting in science and society morebroadly.

7Cases where algorithms transitioned fromexponential to polynomial time (7%) cannot beshown on this graph because the change is toolarge. However, an analysis of the change intheir leading constants shows that, on average,there is a 28% leading constant deflation. Thatsaid, this effect is so small compared to theasymptotic gains that it has no effect on ourother estimates.

PROCEEDINGS OF THE IEEE 7

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

We find enormous heterogene-ity in algorithmic progress, withnearly half of algorithm familiesexperiencing virtually no progress,while 14% experienced improvementsorders of magnitude larger than hard-ware improvement (including Moore’slaw). Overall, we find that algorith-mic progress for the median algo-rithm family increased substantiallybut by less than Moore’s law formoderate-sized problems and by morethan Moore’s law for big data prob-lems. Collectively, our results high-light the importance of algorithms asan important, and previously largelyundocumented, source of computingimprovement.

III. M E T H O D SA full list of the algorithm textbooks,course syllabi, and reference papersused in our analysis can be foundin the Extended Data and Supple-mentary Material, as a list of thealgorithms in each family.

A. Algorithms andAlgorithm Families

To generate a list of algorithms andtheir groupings into algorithmic fami-lies, we use course syllabi, textbooks,and research papers. We gather a listof major subdomains of computerscience by analyzing the courseworkfrom 20 top computer science univer-sity programs, as measured by the QSWorld Rankings in 2018 [22]. We thenshortlist those with maximum overlapamongst the syllabi, yielding thefollowing 11 algorithm subdomains:combinatorics, statistics/machinelearning, cryptography, numericalanalysis, databases, operating sys-tems, computer networks, robot-ics, signal processing, computergraphics/image processing, andbioinformatics.

From each of these subdomains,we analyze algorithm textbooks—one textbook from each subdomainfor each decade since the 1960s.This totals 57 textbooks because notall fields have textbooks in earlydecades, e.g., bioinformatics. Text-books were chosen based on being

cited frequently in algorithm researchpapers, on Wikipedia pages, or inother textbooks. For textbooks fromrecent years, where such citations’breadcrumbs are too scarce, we alsouse reviews on Amazon, Google,and others to source the most-usedtextbooks.

From each of the 57 textbooks,we used those authors’ categoriza-tion of problems into chapters, sub-headings, and book index divisionsto determine which algorithms fami-lies were important to the field (e.g.,“comparison sorting”) and whichalgorithms corresponded to each fam-ily (e.g., “Quicksort” in comparisonsorting). We also searched academicjournals, online course material, andWikipedia, and published theses tofind other algorithm improvementsfor the families identified by the text-books. To distinguish between seriousalgorithm problems and pedagogicalexercises intended to help studentsthink algorithmically (e.g., Towerof Hanoi problem), we limit ouranalysis to families where at leastone research paper was written thatdirectly addresses the family’s prob-lem statement.

In our analysis, we focus on exactalgorithms with exact solutions. Thatis, cases where a problem statementcan be met exactly (e.g., find theshortest path between two nodes ona graph), and there is a guaran-tee that an optimal solution will befound (e.g., that the shortest pathhas been identified). This “exact algo-rithm, exact solution” criterion alsoexcludes, amongst others, algorithmswhere solutions, and even in theory,are imprecise (e.g., detect parts of animage that might be edges) and algo-rithms with precise definitions butwhere proposed answers are approx-imate. We also exclude quantum algo-rithms from our analysis since suchhardware is not yet available.

We assess that an algorithm hasimproved if the work that needs tobe done to complete it is reduced,asymptotically. This, for example,means that a parallel implementa-tion of an algorithm that spreads thesame amount of work across mul-

tiple processors or allows it to runon a GPU would not count towardour definition. Similarly, an algorithmthat reduced the amount of mem-ory required, without changing thetotal amount of work, would simi-larly not be included in this analy-sis. Finally, we focus on worst casetime complexity because it does notrequire assumptions about the distri-bution of inputs and it is the mostwidely reported outcome in algorithmimprovement papers.

B. Historical Improvements

We calculate historical improve-ment rates by examining the initialalgorithm in each family and all sub-sequent algorithms that improve timecomplexity. For example, as discussedin [23], the “Maximum Subarray in1D” problem was first proposedin 1977 by Ulf Grenander with abrute force solution of O(n3) butwas improved twice in the next twoyears—first to O(n2) by Grenanderand then to O(n log n) by Shamosusing a Divide and Conquer strat-egy. In 1982, Kadane came up withan O(n) algorithm, and later thatyear, Gries [24] devised another lin-ear algorithm using Dijkstra’s stan-dard strategy. There were no furthercomplexity improvements after 1982.Of all these algorithms, only Gries’ isexcluded from our improvement listsince it did not have a better timecomplexity than Kadane’s.

When computing the number ofoperations needed asymptotically,we drop leading constants and smallerorder terms. Hence, an algorithmwith time complexity 0.5(n2 + n)

is approximated as n2. As weshow in Fig. 4, this is an excellentapproximation to the improvementin the actual number of algorithmicsteps for the vast majority ofalgorithms.

To be consistent in our notation,we convert the time complexity foralgorithms with matrices, which aretypically parameterized in terms ofthe dimensions of the matrix, to beingparameterized as a function of theinput size. For example, this means

8 PROCEEDINGS OF THE IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

that the standard form for writing thetime complexity of naive matrix multi-plication goes from N3, where N × Nis the dimension of the matrix, to ann1.5

input algorithm when n is the size ofthe input and n = N × N .

In the presentations of our resultsin Fig. 1(d), we “round-up”—meaningthat results between complexityclasses round up to the next highestcategory. For example, an algorithmthat scales as n2.7 would be betweenn3 and n2 and so would get roundedup to n3.8 Algorithms with subex-ponential but superpolynomial timecomplexity are included in the >n3

category.

C. Calculating ImprovementRates and Transition Values

In general, we calculate theimprovement from one algorithm,i , to another j as

Improvementi→ j = Operationsi (n)

Operations j (n)(1)

where n is the problem size and thenumber of operations is calculatedeither using the asymptotic complex-ity or algorithmic step techniques.One challenge of this calculationis that there are some improve-ments, which would be mathemati-cally impressive but not realizable, forexample, an improvement from 22n to2n , when n = 1 billion is an improve-ment ratio of 21 000 000 000. However,the improved algorithm, even aftersuch an astronomical improvement,remains completely beyond the abil-ity of any computer to actually calcu-late. In such cases, we deem that the“effective” performance improvementis zero since it went from “too largeto do” to “too large to do.” In practice,this means that we deem all algorithmfamilies that transition from one fac-torial/exponential implementation toanother has no effective improvementfor Figs. 3 and 4.

8For ambiguous cases, we round based onthe values for an input size of n = 1 000 000.

We calculate the average per-yearpercentage improvement rate over tyears as

YearlyImprovementi→ j

=(

Operationsi (n)

Operations j (n)

)1/t

− 1. (2)

We only consider years since1940 to be those where an algorithmwas eligible for improvement. Thisavoids biasing very early algorithms,e.g., those discovered in Greek times,toward an average improvementrate of zero.

Both of these measures are inten-sive, rather than extensive, in whichthey measure how many more prob-lems (of the same size) could besolved with a given number of oper-ations. Another potential measurewould be to look at the increase in theproblem size that could be achieved,i.e., (nnew/nold), but this requiresassumptions about the computingpower being used, which wouldintroduce significant extra complex-ity into our analysis without compen-satory benefits.

Algorithms With Multiple Parame-ters: While many algorithms only havea single input parameter, others havemultiple. For example, graph algo-rithms can depend on the numberof vertices, V and the number ofedges, E , and, thus, have input sizeV + E . For these algorithms, increas-ing input size could have variouseffects on the number of operationsneeded, depending on how much ofthe increase in the input size wasassigned to each variable. To avoidthis ambiguity, we look to researchpapers that have analyzed problemsof this type as an indication of theratios of these parameters that are ofinterest to the community. For exam-ple, if an average paper considersgraphs with E = 5 V, then wewill assume this for both our basecase and any scaling of input sizes.In general, we source such ratios asthe geometric mean of at least threestudies. In a few cases, such as Con-vex Hull, we have to make assump-

tions to calculate the improvementbecause newer algorithms scale dif-ferently because of output sensitiv-ity and, thus, cannot be computeddirectly with only the inputs parame-ters. In three instances, the functionalforms of the early and later algorithmsare so incomparable that we do notattempt to calculate rates of improve-ment and instead omit them.

D. Transition Probabilities

The transition probability fromone complexity class to another iscalculated by counting the number oftransitions that did occur and divid-ing by the number of transitionsthat could have occurred. Specifically,the probability of an algorithm tran-sitioning from class a to b is given asfollows:

prob(a → b)

= 1

T

∑t∈T

||a → b||t||a||t−1 + ∑

c∈C ||c → a||t (3)

where t is a year from the set of possi-ble years T and c is a time complexityclass from C , which includes the nullset (i.e., a new algorithm family).

For example, say we are interestedin looking at the number of transitionsfrom cubic to quadratic, i.e., a = n3

and b = n2. For each year, we cal-culate the fraction of transitions thatdid occur, which is just the numberof algorithm families that did improvefrom n3 to n2 (i.e., ||a → b||t) dividedby all the transitions that could haveoccurred. The latter includes to threeterms: all the algorithm families thatwere in n3 in the previous year (i.e.,||a||t−1), all the algorithm familiesthat move into n3 from another class(c) in that year (i.e.,

∑c∈C ||c →

a||t ), and any new algorithm familiesthat are created and begin in n3 (i.e.,||Ø → a||t). These latter two areincluded because a family can makemultiple transitions within a singleyear, and thus, “newly arrived” algo-rithms are also eligible for asymptoticimprovement in that year. Averagingacross all the years in the sample pro-vides the average transition probabil-ity from n3 to n2, which is 1.55%.

PROCEEDINGS OF THE IEEE 9

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Point of View

E. Deriving the Number ofAlgorithmic Steps

In general, we use pseudocode fromthe original papers to derive thenumber of algorithmic steps neededfor an algorithm when the authorshave not done it. When that is notavailable, we also use pseudocodefrom textbooks or other sources.Analogously to asymptotic complex-ity calculations, we drop smallerorder terms and their constantsbecause they have a diminishingimpact as the size of the problemincreases.

A c k n o w l e d g m e n tThe authors would like to acknowl-edge generous funding from the TidesFoundation and the MIT Initiativeon the Digital Economy. They wouldalso like to thank Charles Leiserson,the MIT Supertech Group, and JulianShun for valuable input.

A u t h o r C o n t r i b u t i o n sS t a t e m e n t

Neil C. Thompson conceived theproject and directed the data gather-ing and analysis. Yash Sherry gatheredthe algorithm’s data and performedthe data analysis. Both authors wrotethis article.

D a t a A v a i l a b i l i t y a n dC o d e A v a i l a b i l i t y

Data are being made public throughthe online resource for algorithmscommunity at algorithm-wiki.org andwill launch at the time of this article’spublication. Code will be availableat https://github.com/canuckneil/IEEE_algorithm_paper.

A d d i t i o n a l I n f o r m a t i o n

The requests for materials shouldbe addressed to Neil C. Thompson([email protected]).

R E F E R E N C E S[1] Division of Computing and Communication

Foundations CCF: Algorithmic Foundations(AF)—National Science Foundation, Nat. Sci.Found., Alexandria, VA, USA. [Online]. Available:https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503299

[2] Division of Physics Computational and Data-EnabledScience and Engineering (CDS&E)—National ScienceFoundation, Nat. Sci. Found., Alexandria, VA, USA.[Online]. Available: https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504813

[3] J. P. Holdren, E. Lander, and H. Varmus,“President’s council of advisors on science andtechnology,” Dec. 2010. [Online]. Available:https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/pcast-nitrd-report-2010.pdf

[4] R. Bixby, “Solving real-world linear programs: Adecade and more of progress,” Oper. Res., vol. 50,no. 1, pp. 3–15, 2002, doi: 10.1287/opre.50.1.3.17780.

[5] C. E. Leiserson et al., “There’s plenty of room at thetop: What will drive computer performance afterMoore’s law?” Science, vol. 368, no. 6495,Jun. 2020, Art. no. eaam9744.

[6] K. Grace, “Algorithm progress in six domains,”Mach. Intell. Res. Inst., Berkeley, CA, USA,Tech. Rep., 2013.

[7] D. E. Womble, “Is there a Moore’s law foralgorithms,” Sandia Nat. Laboratories,Albuquerque, NM, USA, Tech. Rep., 2004. [Online].

Available: https://www.lanl.gov/conferences/salishan/salishan2004/womble.pdf

[8] N. Thompson, S. Ge, and G. Filipe, “Theimportance of (exponentially more) computing,”Tech. Rep., 2020.

[9] D. Hernandez and T. Brown, “A.I. and efficiency,”OpenAI, San Francisco, CA, USA, Tech. Rep.abs/2005.04305 Arxiv, 2020.

[10] N. Thompson, K. Greenewald, and K. Lee, “Thecomputation limits of deep learning,” 2020,arXiv:2007.05558. [Online]. Available:https://arxiv.org/abs/2007.05558

[11] M. Manohara, A. Moorthy, J. D. Cock, andA. Aaron, Netflix Optimized Encodes. Los Gatos, CA,USA: Netflix Tech Blog, 2018. [Online]. Available:https://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830

[12] S. Ismail, Why Algorithms Are the Future of BusinessSuccess. Austin, TX, USA: Growth Inst. [Online].Available: https://blog.growthinstitute.com/exo/algorithms

[13] S. S. Kortum, “Research, patenting, andtechnological change,” Econometrica, vol. 65, no. 6,p. 1389, Nov. 1997.

[14] T. H. Cormen, C. E. Leiserson, R. L. Rivest,and C. Stein, Introduction to Algorithms,3rd ed. Cambridge, MA, USA: MIT Press,2009.

[15] J. Bentley, Program. Pearls, 2nd ed. New York, NY,USA: Association for Computing Machinery, 2006.

[16] D. E. Knuth, “Optimum binary search trees,” ActaInform., vol. 1, no. 1, pp. 14–25, 1971.

[17] T. C. Hu and A. C. Tucker, “Optimal computersearch trees and variable-length alphabeticalcodes,” SIAM J. Appl. Math., vol. 21, no. 4,pp. 514–532, 1971.

[18] C. Levcopoulos, A. Lingas, and J.-R. Sack,“Heuristics for optimum binary search trees andminimum weight triangulation problems,” Theor.Comput. Sci., vol. 66, no. 2, pp. 181–203,Aug. 1989.

[19] M. Klawe and B. Mumey, “Upper and lower boundson constructing alphabetic binary trees,” SIAM J.Discrete Math., vol. 8, no. 4, pp. 638–651,Nov. 1995.

[20] J. L. Hennessy and D. A. Patterson, ComputerArchitecture: A Quantitative Approach, 5th ed.San Mateo, CA, USA: Morgan Kaufmann, 2019.

[21] F. Le Gall, “Faster algorithms for rectangular matrixmultiplication,” Commun. ACM, vol. 62, no. 2,pp. 48–60, 2012, doi: 10.1145/3282307.

[22] TopUniversities QS World Rankings, QuacquarelliSymonds Ltd., London, U.K., 2018.

[23] J. Bentley, “Programming pearls: Algorithm designtechniques,” Commun. ACM, vol. 27, no. 9,pp. 865–873, 1984, doi: 10.1145/358234.381162.

[24] D. Gries, “A note on a standard strategy fordeveloping loop invariants and loops,” Sci. Comput.Program., vol. 2, no. 3, pp. 207–214, 1982, doi:10.1016/0167-6423(83)90015-1.

10 PROCEEDINGS OF THE IEEE