arXiv:1806.07241v3 [quant-ph] 9 Feb 2021

10
NISQ circuit compilation is the travelling salesman problem on a torus Alexandru Paler, 1, 2, * Alwin Zulehner, 1 and Robert Wille 1 1 Institute of Integrated Circuits, Johannes Kepler University 2 Transilvania University of Bras , ov, Romania Noisy, intermediate-scale quantum (NISQ) computers are expected to execute quantum circuits of up to a few hundred qubits. The circuits have to conform to NISQ architectural constraints regarding qubit allocation and the execution of multi-qubit gates. Quantum circuit compilation (QCC) takes a nonconforming circuit and outputs a compatible circuit. Can classical optimisation methods be used for QCC? Compilation is a known combinatorial problem shown to be solvable by two types of operations: 1) qubit allocation, and 2) gate scheduling. We show informally that the two operations form a discrete ring. The search landscape of QCC is a two dimensional discrete torus where vertices represent configurations of how circuit qubits are allocated to NISQ registers. Torus edges are weighted by the cost of scheduling circuit gates. The novelty of our approach uses the fact that a circuit’s gate list is circular: compilation can start from any gate as long as all the gates will be processed, and the compiled circuit has the correct gate order. Our work bridges a theoretical and practical gap between classical circuit design automation and the emerging field of quantum circuit optimisation. I. Introduction The first general purpose quantum computers, which are called noisy, intermediate-scale quantum (NISQ) com- puters [21], operate on a few hundred qubits and do not support computational fault-tolerance. The IBM Q Ex- perience computers, which fall into the NISQ category, have sparked the interest in the automated compilation of arbitrary quantum circuits. Near-term applications of NISQ may be used to explore many-particle quantum systems or optimisation problems, and the executed cir- cuits are not expected to include sequences longer than 100 gates[21]. Although this is a serious limitation, it is hoped that hardware quality will increase such that longer circuits may be executed. NISQ compilation is motivated in part by the different architectures, but more important by the technical lim- itations of NISQ hardware, such as qubit and quantum gate fault rates, gate execution time etc. Before exe- cuting a quantum computation the corresponding circuit has to be adapted for the particularities of the NISQ computer. A. Background In order to show that the problem of quantum circuit compilation is equivalent to a travelling salesman on a torus (e.g. Fig. 1) we introduce the following background material. We will use NISQ, quantum computer, and chip in- terchangeably. For the purpose of this work, a chip is described entirely by the set of hardware qubits, also called registers [28], and the set of supported interac- tions. The computer is abstracted by a coupling graph (e.g. Fig. 2b), where the registers are the vertices, and * [email protected] FIG. 1. The search landscape of QCC is a discrete two di- mensional torus. The blue circle represents the timeline of the quantum circuit being compiled. The green vertices (only a few are drawn) are qubit allocation configurations encoun- tered during compilation. The red circle is connecting all the qubit allocations configurations available while compiling one of the circuit’s gate. QCC compilation forms a loop formed of green vertices. The solution loop intersects non-trivially all possible red loops on the torus. the edges are the supported multiqubit gates between vertex tuples. In a directed coupling graph G =(V,E), having |V | = q and |E|≤ q(q -1), the edges stand for the CNOTs supported between pairs of physical qubits. The edge directions indicate which qubit is control or target. If the computer supports both CNOT directions between a pair of qubits, there are two directed edges between the corresponding graph vertex pairs. Current NISQ devices do not restrict CNOT direction, and G graphs are nowa- days mostly undirected. In general, NISQs do not have all-to-all connectivity between the registers, and do no support the arbitrary application of multiqubit gates. Consequently, not all the CNOTs of a circuit can be executed without further adjustment. The quantum circuit compilation (QCC) problem is: for a given coupling graph and a quantum circuit C, com- arXiv:1806.07241v3 [quant-ph] 9 Feb 2021

Transcript of arXiv:1806.07241v3 [quant-ph] 9 Feb 2021

NISQ circuit compilation is the travelling salesman problem on a torus

Alexandru Paler,1, 2, ∗ Alwin Zulehner,1 and Robert Wille1

1Institute of Integrated Circuits, Johannes Kepler University2Transilvania University of Bras,ov, Romania

Noisy, intermediate-scale quantum (NISQ) computers are expected to execute quantum circuitsof up to a few hundred qubits. The circuits have to conform to NISQ architectural constraintsregarding qubit allocation and the execution of multi-qubit gates. Quantum circuit compilation(QCC) takes a nonconforming circuit and outputs a compatible circuit. Can classical optimisationmethods be used for QCC? Compilation is a known combinatorial problem shown to be solvable bytwo types of operations: 1) qubit allocation, and 2) gate scheduling. We show informally that thetwo operations form a discrete ring. The search landscape of QCC is a two dimensional discretetorus where vertices represent configurations of how circuit qubits are allocated to NISQ registers.Torus edges are weighted by the cost of scheduling circuit gates. The novelty of our approach usesthe fact that a circuit’s gate list is circular: compilation can start from any gate as long as all thegates will be processed, and the compiled circuit has the correct gate order. Our work bridges atheoretical and practical gap between classical circuit design automation and the emerging field ofquantum circuit optimisation.

I. Introduction

The first general purpose quantum computers, which arecalled noisy, intermediate-scale quantum (NISQ) com-puters [21], operate on a few hundred qubits and do notsupport computational fault-tolerance. The IBM Q Ex-perience computers, which fall into the NISQ category,have sparked the interest in the automated compilationof arbitrary quantum circuits. Near-term applicationsof NISQ may be used to explore many-particle quantumsystems or optimisation problems, and the executed cir-cuits are not expected to include sequences longer than100 gates[21]. Although this is a serious limitation, itis hoped that hardware quality will increase such thatlonger circuits may be executed.

NISQ compilation is motivated in part by the differentarchitectures, but more important by the technical lim-itations of NISQ hardware, such as qubit and quantumgate fault rates, gate execution time etc. Before exe-cuting a quantum computation the corresponding circuithas to be adapted for the particularities of the NISQcomputer.

A. Background

In order to show that the problem of quantum circuitcompilation is equivalent to a travelling salesman on atorus (e.g. Fig. 1) we introduce the following backgroundmaterial.

We will use NISQ, quantum computer, and chip in-terchangeably. For the purpose of this work, a chip isdescribed entirely by the set of hardware qubits, alsocalled registers [28], and the set of supported interac-tions. The computer is abstracted by a coupling graph(e.g. Fig. 2b), where the registers are the vertices, and

[email protected]

FIG. 1. The search landscape of QCC is a discrete two di-mensional torus. The blue circle represents the timeline ofthe quantum circuit being compiled. The green vertices (onlya few are drawn) are qubit allocation configurations encoun-tered during compilation. The red circle is connecting all thequbit allocations configurations available while compiling oneof the circuit’s gate. QCC compilation forms a loop formedof green vertices. The solution loop intersects non-trivially allpossible red loops on the torus.

the edges are the supported multiqubit gates betweenvertex tuples. In a directed coupling graph G = (V,E),having |V | = q and |E| ≤ q(q−1), the edges stand for theCNOTs supported between pairs of physical qubits. Theedge directions indicate which qubit is control or target.If the computer supports both CNOT directions betweena pair of qubits, there are two directed edges between thecorresponding graph vertex pairs. Current NISQ devicesdo not restrict CNOT direction, and G graphs are nowa-days mostly undirected.

In general, NISQs do not have all-to-all connectivitybetween the registers, and do no support the arbitraryapplication of multiqubit gates. Consequently, not allthe CNOTs of a circuit can be executed without furtheradjustment.

The quantum circuit compilation (QCC) problem is:for a given coupling graph and a quantum circuit C, com-

arX

iv:1

806.

0724

1v3

[qu

ant-

ph]

9 F

eb 2

021

2

pile a circuit C ′ which is functionally equivalent to C andcompatible with the coupling graph.

We use the following operations to solve QCC: 1)qubit allocation; 2) CNOT gate scheduling, and 3) cir-cuit traversal - choosing the order in which the CNOTsare compiled. The first two are practically alreadymethodological parts of established quantum circuit de-sign frameworks such as Cirq and Qiskit. A theoreticalanalysis of the first two was provided in [27]. The thirdoperation is based on circular CNOT circuits as intro-duced in [19].

Each of the three operations can be attached to an op-timisation problem. Each of those problems is directlyconnected to the execution of a remote CNOT, whichis defined as the gate that has to be executed betweencircuit qubits allocated on non-adjacent NISQ registers(e.g. Fig. 2). The compilation of a remote CNOT intro-duces additional gates into C ′, because the qubits haveto be effectively moved across the chip until these are onadjacent registers. A correctly compiled C ′ has no re-mote CNOTs. It is assumed that the NISQ chip has atleast as many registers as C ′.

We define the qubit allocation problem with respect tothe effect of compiling remote CNOTs. Fig. 3 includestwo allocation configurations. The labels inside the cou-pling graph vertices represent the allocated circuit qubits.After moving Q4 from one register to another, the con-figuration from Fig. 3a changes to the one from Fig. 3c.

Problem 1 Qubit allocation: Assign circuit qubits toNISQ registers, such that the compiled C ′ has a minimalcost.

FIG. 2. a) Quantum circuit example; b) Example of a cou-pling graph where the vertices Q3 and Q4 are not adjacentto Q0. Considering the graph from b), the CNOTs Q0 → Q3and Q0 → Q4 are remote. The remote CNOTs are high-lighted in a).

The cost mentioned in the Problem 1 could be gatecount, circuit depth etc. For example, the cost can beexpressed in terms of physical CNOT gates and assum-ing a linear nearest neighbour architecture, in Fig.2 thecost of implementing the first CNOT is zero, the secondremote CNOT has a cost of six because two SWAP gatesmay be necessary etc.

For the purpose of this work, the compilation of remoteCNOTs to the NISQ chip is a kind of gate scheduling (seeProblem 2) and we present an example in Fig. 3. Auto-matic approaches for gate scheduling range from globalreordering of quantum wires [36] to application of circuit

FIG. 3. Finding the edge where to execute a remote CNOT:a) The CNOT Q0 → Q4 needs to be implemented, but thequbits are not adjacent. b) Depending on a given cost func-tion, it is determined that moving the qubits to the endpointsof the red edge (between Q3 and Q2) is the most cost effec-tive method to achieve Q0 → Q4. c) A new qubit allocationconfiguration is generated after the Q0 and Q4 were movedacross the chip.

rewrite rules [25]. Gate scheduling has been performedeven manually, by designing circuits that conform to thearchitectural constraints [1, 7].

Problem 2 Gate scheduling: Choose the couplinggraph edge where to execute a remote CNOT.

Gate scheduling is a sub-problem of the qubit allo-cation problem, because scheduling is performed oncequbits are allocated. However, for an exact solution, find-ing the best initial allocation requires iterating throughall possibilities which in turns implies that gate schedul-ing has to be calculated each time. From this perspec-tive, QCC is at least as complex as the qubit allocationproblem.

Regarding scheduling, it is not obligatory to start com-piling from the first remote CNOT of C. One can startfrom an arbitrary gate, as long as the resulting C ′ willrespect the original order from C. For example, if Cconsists of three remote CNOTs g1, g2 and g3, the com-pilation could start with g2, followed by g3 and finally g1.However, C ′ would need to execute the correct order ofcompiled gates g′1, g

′2, g′3.

Problem 3 Circuit traversal: Determine the order inwhich the gates of C should be compiled, such that thecost of C ′ is minimised. The chosen order has to be avalid topological sorting of C.

B. Related work

Most NISQ devices have a topology which is not compat-ible with the quantum circuits that have to be executedon them. Those circuits need to be accordingly modified.Originally, this has been done by adapting the quantumcircuit in a systematic manner (e.g. [7]). However, suchan approach obviously is not feasible—particularly withincreasing size of the considered quantum circuit.

In the past, a huge variety of of methods addressingthis problem have been proposed. While some of them(e.g. [11, 23, 34, 36]) aim to solve QCC in an exact fashion(i.e. generating minimal solutions) most of them provideheuristics. Heuristics are much more established solu-tions for QCC, while exact approaches are mainly used

3

for evaluation purposes (i.e. checking how far heuristicsare from the optimum) or to generate quantum circuitsfor certain “building block”-functionality.

Most of the available heuristics employ a swapping-scheme, i.e. they insert remote CNOT and SWAP op-erations into the originally given quantum circuit thatexchange the state of two physical qubits whenever theydo not satisfy a connectivity constraint. By this, themapping of the logical qubits of the quantum circuitto the physical ones of the hardware changes dynami-cally, i.e., the logical qubits are moved around on thephysical ones. Approaches following this scheme includee.g. [8, 11, 15, 16, 25, 35, 37].

Other approaches use a bridging-scheme which doesnot dynamically change the mapping of the logical qubitsto the physical ones: CNOT gates that violate the con-nection constraint are decomposed into several CNOTgates that bridge the “gap”. This scheme has the ad-vantage that, given the initial mapping, determining themapped circuit is straightforward. On the other side, itoften leads to more costly solutions since the number ofCNOT operations required to realize bridge gates growsexponentially. Approaches following this scheme includee.g. [4, 5, 10, 11, 23].

C. Complexity of QCC

Multiple approaches to showing the complexity of QCChave been presented. One of the first, Maslov[14] demon-strated that a variant of QCC is NP-complete by show-ing that it implies the search of a Hamiltonian cycle in agraph. In the context of our QCC formulation, the workof [14] is concerned with optimal solutions to the qubitallocation problem when the blue torus edge weights aredetermined by the physical gate execution times alongthe longest input-output gate chain.

QCC has also been considered a search problem ac-cording to [27], which includes a detailed review of themethods used for determining the complexity class. Ithas been recently discussed that the complexity of QCCoptimisation is NP-hard [17] by comparing QCC with theoptimisation of fault-tolerant quantum circuits protectedby the surface code [9].

The authors of [2] have shown that QCC as a discreteoptimisation of a circuit’s makespan is NP-complete forQAOA circuits. The proof from [2] on a reduction fromthe Boolean satisfiability problem (SAT) to the QCCproblem, and was applied to circuits consisting of twotypes of two-qubit gates: SWAP and PS (phase separa-tion). The proof did not rely on any particular orderingof the gates in the circuit. Such circuits can be decom-posed with a constant overhead into the circuits we con-sider in this work (CNOT gates and single qubit gates).PS gates can be decomposed using the KAK decomposi-tion [32, 38] into CNOTs and single qubit gates, and theSWAP gate can be decomposed into three CNOTs.

Moreover, a method for optimising QAOA circuits bytaking commutativity into account was presented by [31].

Therein the authors show very convincingly that theoremproving (e.g. Z3 solver) and SAT solvers do not scale forpractically large compilation problems (more than 100qubits and deep circuits): the search space of QCC as anNP-complete problem is still exponential even when thenumber of variables is reduced exponentially. The workof [31] combined with the theoretical approach from [2]highlight the importance of QCC heuristics.

QCC has been presented as an application of tempo-ral planning [33], too. In general, temporal planningcan have a higher complexity than NP-complete. Forexample, concurrent temporal planning is EXPSPACE-complete [24]. This, however, does not imply that QCCwould be EXPSPACE-complete. In fact, in domain-independent planning, it is not uncommon that a plan-ning system attacks a domain with a lower complexitythat the complexity of the AI planning variant that theplanning system at hand can handle. This is done forthe convenience of using a readily available off-the-shelfsystem, when a domain-specific solver is not necessarilyavailable.

II. Methods

We present the construction of how QCC can be solvedas the travelling salesman problem. To this end, we il-lustrate the construction of the QCC torus.

A. Arranging qubit allocations in a circle

Allocating circuit qubits to NISQ registers can be ex-pressed as a permutation vector of length q. For exam-ple, assume that Qi are the qubits of a circuit C, andHi are the registers of a computer, for i ≤ q. Both thecomputer and the circuit have q = 5 qubits. The permu-tation p1 = (0, 1, 2, 3, 4) is the trivial allocation where Qi

are allocated to p1[Qi] = Hi: circuit qubit Q0 at registerH0, qubit Q1 at register H1 etc. Another example is thepermutation p2 = (2, 1, 0, 4, 3), where Q0 is allocated toH2, Q1 is allocated to H1 etc.

In the following, a configuration is a permutationthat represents how circuit qubits are allocated to NISQregisters. The terms permutation and configuration willbe used interchangeably. For example, p1 and p2 areconfigurations, too.

The set of all permutations forms a symmetric groupwith q! elements. The group has q−1 transposition gener-ators. A transposition swaps two elements of the permu-tation, and keeps all other entries unchanged. Any groupelement is expressed through a non-unique sequence oftransposition generators.

The group structure can be visualised as a graph. Theelements are vertices, and edges are transpositions con-necting the vertices. If all group elements are exhaus-tively enumerated, the graph is a circle (e.g. Fig. 4) withq! vertices. There exist more compact representationsof the group, such as the complete graph Kq. Withoutaffecting the generality, the exhaustive representation is

4

preferred in this work.

FIG. 4. Circular arrangement of 6 = 3! configurations. Inorder to avoid confusion with a coupling graph, vertices arepolygons. The permutation vector associated to each vertexis placed next to the vertex. For example, [0, 1, 2] is nextto p0. The edge represents transpositions. For example, thepermutation p0 is transformed into p1 by transposing 0 and1. The edges and vertices are red in order highlight the cor-respondence to Fig. 1. The weights along the red circles arezero (Sec. II D).

B. The circuit as a circle of CNOTs

The compilation problem has been reduced to schedul-ing the execution of CNOTs, remote or not. Quantumcircuits are often manipulated as directed acyclic graphs(DAGs) with vertices for quantum gates. Edge directionsreflect the gate ordering inside the circuit. For the pur-pose of this work, the DAG representation is replaced bythe equivalent (blue) circle of CNOTs [19]. The order ofthe vertices on the blue circle encodes one of the equiv-alent topological orderings from the DAG. In general,gate commutativity may be used to improve the compiledcircuit (see Appendix on the backtracking method). Inparticular, all equivalent DAG topological orderings mayneed to be considered. The latter is equivalent to com-muting gates from the chosen topological ordering withan identity gate.

A circle is obtained as follows: a) only the CNOTs arekept from the circuit, and other gates are discarded (e.g.the T gates from Fig. 4), b) the wire endpoints corre-sponding to input and output are joined together. Fig. 5is an example of obtaining a circular CNOT circuit. Pairsof adjacent vertices in the chain represent the qubit al-location configurations before and after a remote CNOTwas compiled.

It is possible to start compiling a circuit C from anygate g and not necessarily from the first gate. The cir-cular CNOT circuit supports the correctness of this ob-servation. Let us consider that the circuit C is the ap-plication of a sequence of two sub-circuits A and B, suchthat C = AB. Moreover, we model QCC as a func-tion that computes QCC(C) = C ′, where QCC(C) =QCC(A)QCC(B) = C ′.

Instead of starting with the first gate of A, we assumethat compilation starts from sub-circuit B and that theCNOT circle is traversed in the correct order. Alongthe circular traversal A† will be compiled instead of A.The compilation result will be circuit D for which D =QCC(B)QCC(A†).

FIG. 5. Example: A quantum circuit with four CNOTsnamed G0 . . . G3 is drawn as a circular CNOT circuit havingthe wires as concentric circles. The normal circuit is used toobtain a chain of configurations after the application of eachCNOT. The edges in the chain represent the CNOT. The endsof the chain can be joined to form a circle. The edges andvertices are blue in order to highlight the correspondence toFig. 1

.

However, it is possible to to reconstruct C ′ by invert-ing the gate list of QCC(A†) such that QCC(A†)† =QCC(A). This divide and conquer approach does notimply that a greedy approach can solve QCC efficiently.It still is a combinatorialy difficult to choose the best gatefrom C to start compilation from.

Starting the traversal of CNOT circles from arbitrarypositions can be advantageous for reducing the total costof the compiled circuit.

It is not guaranteed that Cost(C,D) = Cost(C,C ′),such that a heuristic approach to QCC could be to startcompiling from different gates of C.

C. Unfolding the torus

The circular graph of configurations and the CNOT circlecan be combined to a torus (e.g. Fig. 1, 6). The torushas a discrete structure, which can be used to visualiseand analyse QCC. For visualisation purposes, the toruscan be cut and unfolded to a planar structure. We willresort to a single cut along the configuration circle. Theresult will be a two dimensional diagram like the one inFig. 7. Let one side of the cut be called the start-circleand the other side the stop-circle.

As shown in Fig. 7 and Fig. 8, a hypothetical quantumcompiler will traverse vertices of the torus. The numberof torus vertices is the total number of states the compilershould consider, and there are q!×|C| states. By restatingProblem 1, the compiler will find a path from the startcircle to the stop circle (Fig. 7 and Fig. 6). There is acombinatorial number of paths of various lengths betweenpairs of start-stop vertices. QCC executes, in the bestcase, linearly in the number of circles traversed betweenstart and stop.

5

FIG. 6. After cutting the torus along the yellow line, the startcircle is brown and the stop circle is red. Compilation is theproblem of connecting the circles by a loop of green vertices,such that the sum of the edge costs is minimal.

FIG. 7. Example of a search diagram with 3! configurations.The red configuration circles are concentric. The red cir-cles are interconnected by CNOT circles that were cut likein Fig. 6. Search starts in the centre of the diagram and stopsat one of the endpoints of the radial CNOT circles. The greenvertices indicate the path found by the compiler.

D. Edge weights

The edges connecting the vertices of the torus areweighted. Two extreme cases are possible: a) all edgeshave weight zero; b) all edges have equal weight. The firstcase is not realistic in the context of QCC. The secondcase arises when the NISQ device has all-to-all connec-tivity, such that the shortest path between a start and astop circle is given by the straight traversal of a CNOTcircle. For the purpose of this work, the red edges (con-figuration edges) have zero weight, and the blue edges(CNOT edges) have non-zero weight. The motivation

FIG. 8. Compilation visualised as a movement on torus. TwoCNOTs are compiled (orange and green). left) The compila-tion of the two CNOTs is a movement along the CNOT cir-cle. right) The compilation of the first CNOT includes firsta movement along a configurations circle. Both compilations(left and right) result in a correct circuit, but the compiledresults have different costs.

for this model is twofold.First, our goal is to show that QCC is TSP (see Sec-

tion III A), and we have chosen the generalised TSP formas presented in [18]. This TSP form uses the concept ofconnected city clusters. The movement within a clusterhas cost zero, but the movement between clusters non-zero. In our case, the red rings are the clusters and theblue rings are the connections between the clusters.

Second, instead of weighing the red edges, we considertheir cost as part of the blue edge weights. Each blueedge traversal requires compiling a (remote) CNOT. Thecompilation is thus determined by: a) the cost of im-plementing the transposition resulting by moving alongthe red ring (a new start qubit allocation configurationfrom which the CNOT is compiled), and b) the cost ofeffectively scheduling the remote CNOT.

Additionally, we note that by joining the first and lastred rings the qubit allocation configuration has to be thesame, in general. This is the case, when compilationdoes not start from the first gate of the circuit (cf. Sec-tion II B) and needs to reconstruct the solution. Afterreconstructing the solution, however, the wire permuta-tions before the first and last gate can be removed – theseare simple wire relabelling operations. As a result, con-figuration changes on the start/stop circle come for freeand are not considered in the compilation cost, becauseno gates need to be inserted in the circuit. For example,in Fig. 8 the orange traversal of the configuration ringhas cost zero.

This brings us to the particular QCC scenario, whichwe assume being the common one, when compilationstarts from the start ring (e.g. brown in Fig. 6 corre-sponds to the first gate from the uncompiled circuit) andends on a different vertex of the same ring. Different ver-tices on the same ring refer to different qubit allocationconfigurations. Therefore, in particular, it is an accept-able solution to end on the same ring, but on a differentvertex.

We mentioned that the weights may be, for example,the number of physical CNOTs necessary to implementa remote CNOT. In general, edge weights are assigned

6

by a cost function. It is the task of the cost functionto extract information from the circuits and the couplinggraph. It is the task of the cost function, for example,to perform topological analysis of the circuit and cou-pling graph [6]. The cost of gate compilation could in-clude also lookahead information, similarly to how thiswas performed for example for linear nearest neighbourarchitectures [35].

Formulating explicit cost functions does not fall withinthe scope of this work. As shown in [31], once the costfunctions are specified, formulating the optimisation ob-jective is a highly nontrivial task. The optimisation ob-jective is for exact QCC methods like the actual code im-plementation is to the heuristic QCC methods. There-fore, even if we would specify the exact functions, theoptimality of the compiled circuit would depend on thetime-space trade off allowed by the heuristic implemen-tation. In particular, just as examples: a) if the optimi-sation goal is the minimum number of SWAPs one coulduse the MI strategy from the Appendix; b) for minimis-ing depth, and by making no assumptions about gateexecution time like in [14], the optimisation goal wouldbe makespan [2].

From the perspective of an arbitrary functionCost(C,C ′) that calculates the cost of compiling C intoC ′ (similar to discussion in (Sec. III A), we can statethat QCC optimisation is to find a circuit M such thatCost(C,M) = min(Cost(C,Cl)), where Cl is a loop onthe torus. The best Cl loop has the minimum sum of thetraversed edges.

III. Results

The landscape of QCC is a discrete torus obtained fromthe Cartesian product of two circles. One of the circlesrefers to the group structure of the qubit allocations pos-sible when scheduling a gate (ie. the red circle in Fig. 1).The other circle is generated by the fact that the CNOTsof a circuit can be arranged in a circular form (ie. theblue circle in Fig. 1) [19].

The torus includes |C| red circles - one for each gatefrom C. There are q! blue circles: one for each possi-ble permutation of circuit qubits to NISQ registers. Thedetails of constructing the torus were presented in Sec. II.

A. QCC is a TSP

In the following we show that QCC is a travelling sales-man problem (TSP). The Appendix includes a backtrack-ing formulation of QCC as TSP. Independent of thiswork, the authors of [27] have decomposed the com-pilation problem into two steps: qubit allocation andscheduling of multi-qubit gates. In practice, this ap-proach has already been followed by quantum circuitframeworks such as Cirq and Qiskit: the circuit qubitsare mapped to the NISQ device, and then the circuitgates are scheduled. For QCC benchmarking purposes,the two-step approach has also been used by [30]. We

augment the QCC decomposition by including the cir-cular CNOT structure. This will be useful for analysingthe problem complexity. The exact complexity dependson how the cost function is implemented and evaluated.

We use the following definitions:

• A solution is any loop that intersects non-triviallythe red circle from Fig. 1. There is a combinatorialnumber of potential solutions.

• The minimum solution is the loop for which thesum of the edge weights is minimal (Sec. II D).

According to the discussion in Sec. II D, only the edgesalong the CNOT circles have non-zero weights. Each so-

lution is the sum of |C| weights Costl =∑|C|

0 wp,q, wherel is the index of the solution and p, q are the indices ofthe configurations connected by the edge that has weightwp,q. The solution of QCC is min(Costl), for l ≤ (q!)|C|

when the exhaustive enumeration of the configurationsis used. In the light of the definitions of Problems 1-3,where a minimum cost circuit is searched, QCC is anexample of combinatorial optimisation.

We show that QCC is a generalised TSP (GTSP). Theoriginal TSP problem is defined for a number of cities, forwhich the distances between pairwise cities are known.TSP answers the question: what is the shortest possibleroute visiting all the cities and returning to the origincity? In GTSP the cities are arranged into clusters, andthe edges connecting the cities inside the cluster haveweight zero [18]. At least one city from each cluster hasto be visited on the shortest path [18].

QCC is GTSP when considering each red configurationring of the |C| as a cluster of cities. Moreover, the zeroweight cluster edges are consistent to how the weightsalong the configuration rings are set in Sec. II D. Thereare |C| configurations circles in the torus. The distancesbetween the cities are the weights along the CNOT edges.The salesman is expected to traverse at least once eachconfiguration circle between the red start circle and thebrown circle from Fig. 6.

The fact that the configuration rings are arranged in acircle does not make the problem easier. Assuming that|C| has only three remote CNOTs, then there are onlythree clusters for which the GTSP has to be computed.However, the arrangement of the three clusters corre-sponds to a complete graph K3 – the smallest instanceof GTSP. Increasing the length of the circuit increases thenumber of clusters, but does not reduce the complexityof the optimisation problem.

The decision GTSP version of QCC answers the ques-tion: is there a route/loop of cost less than a speci-fied Costroute? Any potential solution can be verifiedby tracking the proposed solution loop along the torus.Because of its complexity, QCC has to be solved usingheuristics. Benchmarking circuits for which the mini-mum Costroute is known beforehand [30] are a good wayto evaluate the performance of the heuristics.

7

B. QCC is a ring

The discrete torus shows that QCC, from the perspectiveof discrete mathematics, is a ring with the two QCC-operations being: 1) qubit allocation; 2) gate scheduling.

We can define commutativity in a manner compatiblewith quantum circuit execution. Very informally, twoQCC-operations are commutative iff the computation im-plemented by the circuit is unchanged after reorderingthe QCC-operations. Consequently, qubit allocation iscommutative because it is effectively a renaming of wires.It does not matter in which order the qubits are allo-cated, this does not change the computation. QCC-gatescheduling is not commutative because, in general, twoCNOTs are not commutative. Consequently, the circle ofallocations is the illustration of an Abelian group, and theCNOT-circle represents a monoid. The Abelian groupand the monoid form the discrete torus where traversalare unidirectional. We leave a formalisation of the math-ematical structure of QCC for future work.

IV. Discussion and Conclusion

NISQ compilation is receiving increased attention, due toits practical industrial relevance. In this work, the QCCproblem was decomposed into a set of sub-problems,whose individual solution is found by traversing cir-cles. This enabled the formulation of QCC as a trav-elling salesman problem along a torus. We have imple-mented the TSP approach to QCC compilation at https://github.com/alexandrupaler/k7m and we have usedit as part of a machine learning approach to QCC in [20].

The torus structure presented has the potential to gen-erate other efficient heuristics for the QCC compilation.Exact QCC methods [31, 36] scale poorly, because theseare as fast as the underlying solver. The highly regularand cyclic structure of the torus search space may in-spire improved variable encodings such that exact layoutmethods can be pushed in the area of 100-qubit circuits.

The TSP formulation hints at the conceptual similari-ties between QCC and the automatic design of quantumoptical experiments [12]. The latter consist of discreteoptical elements, which can be placed in a combinatorial(mapping steps) number of experimental configurationsformed by different devices (scheduling step). At thesame time, forming loops on the torus shows that QCCis similar to a dynamic optimisation problem [3], and thatit would be reasonable to expect methods based on antcolonies or evolutionary algorithms for solving QCC.

Finally, because quantum methods have been appliedto TSP (using QAOA in [22], using Grover’s algorithm by[29]), our work opens the possibility to optimise quantumcircuits with quantum computers.

Acknowledgement

We are very grateful to Adi Botea for his technical input,feedback and suggestions. We acknowledge the inputof Razvan Andonie on optimisation problem complex-ity classes, as well as the technical corrections proposedby Daniel Herr. This work was funded by the Linz Insti-tute of Technology project CHARON, the Google FacultyResearch Award FRA ANGELICO, and the NUQATproject of Universitatea Transilvania Brasov.

[1] Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy,Ryan Babbush, Nan Ding, Zhang Jiang, Michael J Brem-ner, John M Martinis, and Hartmut Neven. Character-izing quantum supremacy in near-term devices. NaturePhysics, page 1, 2018.

[2] Adi Botea, Akihiro Kishimoto, and Radu Marinescu.On the complexity of quantum circuit compilation. InEleventh Annual Symposium on Combinatorial Search,2018.

[3] Carlos Cruz, Juan R Gonzalez, and David A Pelta.Optimization in dynamic environments: a survey onproblems, methods and measures. Soft Computing,15(7):1427–1448, 2011.

[4] Alexandre AA de Almeida, Gerhard W Dueck, andAlexandre CR da Silva. CNOT gate mappings to Clif-ford+T circuits in IBM architectures. In Int’l Symp. onMulti-Valued Logic, pages 7–12, 2019.

[5] Gerhard W Dueck, Anirban Pathak, Md Mazder Rah-man, Abhishek Shukla, and Anindita Banerjee. Opti-mization of circuits for IBM’s five-qubit quantum com-puters. In EUROMICRO Symp. on Digital System De-sign, pages 680–684, 2018.

[6] Davide Ferrari and Michele Amoretti. Demonstrationof envariance and parity learning on the ibm 16 qubit

processor. arXiv preprint arXiv:1801.02363, 2018.[7] AG Fowler, SJ Devitt, and LCL Hollenberg. Imple-

mentation of shor’s algorithm on a linear nearest neigh-bour qubit array. Quantum Inf. Comput., 4(quant-ph/0402196):237–251, 2004.

[8] Wakaki Hattori and Shigeru Yamashita. Quantum circuitoptimization by changing the gate order for 2D nearestneighbor architectures. In Workshop on Reversible Com-putation, pages 228–243, 2018.

[9] Daniel Herr, Franco Nori, and Simon J Devitt. Optimiza-tion of lattice surgery is np-hard. Npj quantum informa-tion, 3(1):1–5, 2017.

[10] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, andAtsushi Matsuo. Optimization of quantum circuit map-ping using gate transformation and commutation. In-tegr., 70:43–50, 2020.

[11] Toshinari Itoko, Rudy Raymond, Takashi Imamichi, At-sushi Matsuo, and Andrew W. Cross. Quantum circuitcompilers using gate commutation rules. In Asia andSouth Pacific Design Automation Conf., pages 191–196,2019.

[12] Mario Krenn, Manuel Erhard, and Anton Zeilinger.Computer-inspired quantum experiments. arXiv preprintarXiv:2002.09970, 2020.

8

[13] Dmitri Maslov. Basic circuit compilation techniques foran ion-trap quantum machine. New Journal of Physics,19(2):023035, 2017.

[14] Dmitri Maslov, Sean M Falconer, and Michele Mosca.Quantum circuit placement. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Sys-tems, 27(4):752–763, 2008.

[15] A. Matsuo, W. Hattori, and S. Yamashita. Reducing theoverhead of mapping quantum circuits to IBM Q sys-tem. In IEEE International Symposium on Circuits andSystems, 2019.

[16] Atsushi Matsuo and Shigeru Yamashita. An efficientmethod for quantum circuit placement problem on a 2-D grid. In Workshop on Reversible Computation, pages162–168, 2019.

[17] Beatrice Nash, Vlad Gheorghiu, and Michele Mosca.Quantum circuit optimizations for nisq architectures.Quantum Science and Technology, 5(2):025010, 2020.

[18] Charles E Noon and James C Bean. An efficient trans-formation of the generalized traveling salesman problem.INFOR: Information Systems and Operational Research,31(1):39–44, 1993.

[19] Alexandru Paler. Circular CNOT Circuits: Definition,Analysis and Application to Fault-Tolerant QuantumCircuits. In International Conference on Reversible Com-putation, pages 199–212. Springer, 2016.

[20] Alexandru Paler, Lucian M Sasu, Adrian Florea, andRazvan Andonie. Machine learning optimization of quan-tum circuit layouts. arXiv preprint arXiv:2007.14608,2020.

[21] John Preskill. Quantum computing in the nisq era andbeyond. Quantum, 2:79, 2018.

[22] Matthew Radzihovsky, Joey Murphy, and Mason Swof-ford. A qaoa solution to the traveling salesman problemusing pyquil. 2019.

[23] Md. Mazder Rahman and Gerhard W. Dueck. Synthe-sis of linear nearest neighbor quantum circuits. CoRR,abs/1508.05430, 2015.

[24] Jussi Rintanen et al. Complexity of concurrent temporalplanning. In ICAPS, volume 7, pages 280–287, 2007.

[25] Mehdi Saeedi, Robert Wille, and Rolf Drechsler. Synthe-sis of quantum circuits for linear nearest neighbor archi-tectures. Quantum Information Processing, 10(3):355–377, 2011.

[26] Vivek V. Shende and Igor L. Markov. On the CNOT-costof TOFFOLI Gates. Quantum Info. Comput., 9(5):461–486, May 2009.

[27] Marcos Yukio Siraichi, Vinıcius Fernandes dos San-tos, Caroline Collange, and Fernando Magno QuintaoPereira. Qubit allocation as a combination of sub-graph isomorphism and token swapping. Proceedings ofthe ACM on Programming Languages, 3(OOPSLA):1–29,2019.

[28] Marcos Yukio Siraichi, Vinıcius Fernandes dos Santos,Sylvain Collange, and Fernando Magno Quintao Pereira.Qubit allocation. In Proceedings of the 2018 InternationalSymposium on Code Generation and Optimization, pages113–125, 2018.

[29] Karthik Srinivasan, Saipriya Satyajit, Bikash K Behera,and Prasanta K Panigrahi. Efficient quantum algorithmfor solving travelling salesman problem: An ibm quan-tum experience. arXiv preprint arXiv:1805.10928, 2018.

[30] B. Tan and J. Cong. Optimality study of existing quan-tum computing layout synthesis tools. IEEE Transac-

tions on Computers, pages 1–1, 2020.[31] Bochen Tan and Jason Cong. Optimal layout synthesis

for quantum computing. In 2020 IEEE/ACM Interna-tional Conference On Computer Aided Design (ICCAD),pages 1–9. IEEE, 2020.

[32] Robert R Tucci. An introduction to cartan’s kak de-composition for qc programmers. arXiv preprint quant-ph/0507171, 2005.

[33] Davide Venturelli, Minh Do, Eleanor Rieffel, and JeremyFrank. Compiling quantum circuits to realistic hardwarearchitectures using temporal planners. Quantum Scienceand Technology, 3(2):025004, 2018.

[34] Robert Wille, Lukas Burgholzer, and Alwin Zulehner.Mapping quantum circuits to IBM QX architectures us-ing the minimal number of SWAP and H operations. InDesign Automation Conf., page 142, 2019.

[35] Robert Wille, Oliver Keszocze, Marcel Walter, PatrickRohrs, Anupam Chattopadhyay, and Rolf Drechsler.Look-ahead schemes for nearest neighbor optimization of1d and 2d quantum circuits. In Design Automation Con-ference (ASP-DAC), 2016 21st Asia and South Pacific,pages 292–297. IEEE, 2016.

[36] Robert Wille, Aaron Lye, and Rolf Drechsler. Exact re-ordering of circuit lines for nearest neighbor quantumarchitectures. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 33(12):1818–1831, 2014.

[37] Alwin Zulehner, Alexandru Paler, and Robert Wille. Anefficient methodology for mapping quantum circuits tothe IBM QX architectures. IEEE Trans. on CAD of In-tegrated Circuits and Systems, 2018.

[38] Alwin Zulehner and Robert Wille. Compiling su (4)quantum circuits to ibm qx architectures. In Proceedingsof the 24th Asia and South Pacific Design AutomationConference, pages 185–190, 2019.

Appendix

A. Moving on a (red) configurations circle

At least two strategies are possible for compiling a singleremote CNOT. The methods are illustrated in Fig. 9.The first strategy is MIM (abrv. for move-interact-move): move one of the qubit states on a wire next tothe other qubit’s wire, interact the qubits, and then swapback to the original wire. The second strategy is calledMI (abrv. for move-interact) and is similar to the firstone but without swapping back the moved qubit state.

FIG. 9. Two strategies of inserting SWAPS: a) initial circuit;b) Move-Interact-Move (MIM); c) Move-Interact (MI). Theinitial configuration is maintained after applying MIM (e.g.(0, 1, 2, 3, 4)). The MI strategies results in the (1, 2, 3, 0, 4)configuration.

9

Applying MIM once introduces 2d SWAP gates in thecircuit, while the MI strategy only d SWAPS, where d isthe distance between the remote wires. A straightforwarddistance function could be, for example, the Manhattandistance which can be used for LNN as well as grid NISQarchitectures. For a given permutation p, after applyingMIM, the resulting permutation is also p.

On the contrary, after an MI swap, the resulting per-mutation is a p′, obtained through the sequence of trans-positions representing the SWAP gates. Although MI in-troduces less SWAPS, it increases the complexity of thecompilation problem: each remote CNOT will result in anew permutation, such that the circuit qubit allocationconfiguration is evolving after each CNOT.

In the presence of evolving configurations, state of theart compilation methods are solving the following prob-lem: find an optimal circuit consisting entirely of SWAPgates that transforms a current permutation pin to a per-mutation pout such that a given batch of remote CNOTscan be implemented on the given coupling graph. Inother words, an optimal sequence of transpositions issought, such that pout conforms to a set of constraintsimposed by all the CNOTs to implement. During thesearch of a SWAP circuit, or after a SWAP circuit wasfound, it is checked that pout conforms to the couplinggraph.

FIG. 10. Example of a SWAP circuit that generates a wirepermutation

This approach implies that pout and the SWAP circuitgenerating it are computed for more than a single CNOT(e.g. Fig. 10). Both the set of remote CNOTs and theSWAP circuits are computed using heuristics (e.g. ran-domised algorithm in the IBM QISKit, A*-search [37] ortemporal planners [33]).

B. Moving on a (blue) CNOT circle

Movement on a CNOT circle is equivalent to compilingCNOT gates sequentially. This is not to say that CNOTcannot be parallelised in the resulting C ′ circuit. Paral-lelisation of a batch of CNOTs can be visualised on thetorus: a CNOT is selected from the batch and compiledsuch that, for the remaining CNOTS, zero weights areplaced on the edges connecting the configuration circles.Thus, for the first CNOT a kind of lookahead strategy[35] has to be used to determine the configuration thatwill generate zero weight edges in the future.

Without discussing lookahead methods, compilationimplies finding a good configuration and then advanc-ing on the CNOT circle. Thus, compilation is precededby movements along the configuration circle whenever

SWAP networks are used to prepare the configuration.But because remote CNOTs can be implemented alsowithout SWAP networks, compilation of remote CNOTscan also have a different cost.

C. Backtracking for TSP

Having paralleled QCC to TSP, we can formulate a naivebacktracking algorithm for compilation. The first step ofthe algorithm is to determine an initial configuration:how circuit qubits are mapped (allocated) to the NISQ(Fig. 8a). Afterwards, the first edge of the CNOT cir-cle starting from this configuration vertex is traversedby choosing a coupling graph edge where to execute theCNOT. A new configuration is reached by using the MIswap strategy (Fig. 11). The next torus edge traversalis prepared by moving around the configurations circle(Fig. 8b) and landing in a new configuration.

The backtracking step consists of two sub-steps:traversing the current configuration circle, followed bytraversing the CNOT circle. The backtracking step un-does the last CNOT compilation and moves along theprevious configuration circle. This is equivalent to se-lecting a different edge where to map the remote CNOTthat was just undone.

A solution is found each time a vertex from the out-most CNOT circle, marked by . . . Stop, is touched. Eachsolution is stored, and the best one is selected after thebacktracking algorithm finishes: when all the cycles andconfigurations were naively considered.

Similarly to [31], it is possible to further increase togenerality of the backtracking procedure by consideringgate commutations on the blue rings. Then for each com-bination of the supported gate commutations, the torushas to be regenerated and the QCC procedure will haveto be repeated.

FIG. 11. Arrangement of configurations after MIM (a) andMI (b) swaps. Each configuration is coloured distinctively.MIM maintains the same configuration, while MI does not.

D. Pre- and post-processing

The problem statement of QCC does not mention if Cis expressed using the universal gate set supported bythe NISQ. If this is not the case, C has to be translatedto a functionally equivalent C ′′ that uses gates compat-

10

FIG. 12. The order of the concentric configuration circles isswapped after commuting CNOTs from C. In this example,there are two CNOTs to be scheduled: the first one is markedby green vertices, the second one by thick black stroked ver-tices (a single vertex of this CNOT is included in the figure).a) The green CNOT is compiled first, and the white one sec-ond; b) Assuming that the CNOTs can be commuted in theoriginal circuit, the order of the vertices in each CNOT circlecan be permuted. The white CNOT is compiled first and thegreen CNOT second (now, a single vertex of this CNOT isincluded in the figure).

ible with the NISQ gate set. This is a complex QCCpre-processing task with regard to the optimal numberof resulting gates (e.g. [26]), and does not fall within thescope of this work. Also, quantum algorithm and quan-tum hardware optimisations (cf. [13]) are not consideredparts of the general QCC framework.

The very high complexity of the exact method is amotivation for heuristics. It is useful to attempt to iden-tify heuristic types and functionalities. As mentioned inSec. I, compilation is the process of transforming a cir-cuit C into another circuit C ′ that conforms to a set ofconstraints encoded into a coupling graph. Therefore, itis possible to preprocess C and postprocess C ′.

Preprocessing adapts C for compilation, and it is vi-able to try and reduce the number of single qubit gatesand CNOT gates by using, for example, template basedoptimisations [25]. Postprocessing can be template based

too, as well as include recompilation of subcircuits of C ′.For example, the IBM Qiskit uses this approach for singlequbit gates, and this procedure was used by [38].

Heuristics can be included also for the previously dis-cussed mapping problems. Selecting the start configu-ration (or any other configuration along the concentriccycles) could be performed using existing LNN optimi-sation methods, but cost models adapted to MI swapsshould be formulated and analysed first. Another pos-sibility is to collect all configurations generated along aCNOT-chain and try them out as start configurations.However, given the dimension of each configuration cy-cle, the collected configurations may be as good/bad asthe initial one. Ranking coupling graph nodes is anotherheuristic for building the initial configuration [6]. Thecircuit mapping strategy presented in [14] would also fallin this category.

Traversal of edges along CNOT circles could be sped upby reducing the number of backtracking steps (minimumis zero), and to select from a few best possible edges forthe mapping. The procedure for selecting the best cou-pling graph edge is the following: 1) Shortest paths be-tween all pairs of coupling graph vertices are computedusing the Floyd-Warshall algorithm; 2) It is possible toadd weights to the coupling graph edges (e.g. to pre-fer certain areas of the graph), or to treat the couplinggraph as undirected; 3) Once a remote CNOT needs tobe mapped to an edge, the sum of the distances betweenthe coupling graph vertices where the qubits are locatedand each graph edge vertices is computed (e.g. Fig. 3).The edge with the minimum distance sum is chosen, and,if multiple edges have the same distances, the last one inthe list is chosen. Thus, the weighting function used forthe coupling graph edges influences the edge selection.

Edge mapping could be performed for multiple remoteCNOTs in parallel, too. This possibility shows that thealgorithm from [37] is a heuristic fitting in the frameworkof this work.