Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations

Exact and Approximate Two Dimensional Pattern Matchingallowing RotationsKimmo Fredriksson � Gonzalo Navarro y Esko Ukkonen zAbstractWe give fast �ltering algorithms for searching a 2{dimensional pattern in a 2{dimensionaltext allowing any rotation of the pattern. We consider the cases of exact and approximatematching under several matching models, improving the previous results. For a text of sizen� n character and a pattern of size m �m characters, the exact matching takes average timeO(n2=m). If we allow k{mismatches of characters, then our best algorithm achieves O(n2pk=m)average time, for reasonable error levels. For large k, we obtain a O(n2k3=2plogm=m) averagetime algorithm. We generalize the algorithms for the matching model where the sum of absolutedi�erences between characters is at most k.1 IntroductionWe consider the problem of �nding the exact and approximate occurrences of a two{dimensionalpattern of size m �m cells from a two{dimensional text of size n � n cells, when also all possiblerotations of the pattern are allowed. This problem is often called rotation invariant template match-ing in the signal processing literature. Template matching has numerous important applications inimage and volume processing. The traditional approach [5] to the problem is to compute the crosscorrelation between each text location and each rotation of the pattern template. This can be donereasonably e�ciently using the Fast Fourier Transform (FFT), requiring time O(Kn2 logn) whereK is the number of rotations sampled. Typically K = O(m) in 2D case, and K = O(m3) in 3Dcase, which makes the FFT approach very slow in practice. However, in many applications only\close enough" matches of the pattern are accepted. To this end, the user may specify a parameterk, such that matches that have at most k di�erences with the pattern should be accepted. Thisallows us to derive e�cient algorithms.E�cient two dimensional combinatorial pattern matching algorithms, that do not allow rota-tions of the pattern, can be found e.g. in [4, 2, 3, 11]. Rotation invariant template matching was �rstconsidered from a combinatorial point of view in [7]. In this paper, we follow this combinatorial lineof work. If we consider the pattern and text as regular grids, then de�ning the notion of matchingbecomes nontrivial when we rotate the pattern: since every pattern cell intersects several text cells�Dept. of Computer Science, University of Helsinki. Work supported by ComBi.yDept. of Computer Science, University of Chile. Work developed while the author was on leave at Dept. ofComputer Science, Univ. of Helsinki, partially supported by the Academy of Finland and Fundaci�on Andes.zDept. of Computer Science, University of Helsinki. Work supported by the Academy of Finland.1

and vice versa, it is not clear what should match with what. Among the di�erent matching modelsconsidered in previous work [7, 8, 9], we stick to the simplest one in this paper: (1) the geometriccenter of the pattern has to align with the center of a text cell; (2) the text cells involved in thematch are those whose geometric centers are covered by the pattern; (3) each text cell involved ina match should match the value of the pattern cell that covers its center.Under this exact matching model, an online algorithm is presented in [7] for searching a patternallowing rotations in O(n2) average time.The model (a 3D version) was extended in [9] such that there may be a limited number (k)of mismatches between the pattern and its occurrence. Under this mismatches model a O(k3=2n2)average time (2D version) of the algorithm can be obtained to �nd the occurrences. This works forany 0 � k < m2. For small k, an O(k1=2n2) average time algorithm was given in [6].Finally, a more re�ned model [10, 6, 9] suitable for gray level images adds up the absolute valuesof the di�erences in the gray levels of the pattern and text cells supposed to match, and puts anupper limit k on this sum. Under this gray levels model they achieve O((k=�)3=2n2) average timealgorithm, assuming that the cell values are uniformly distributed among � gray levels.Similar algorithms for 3D can be found in [9], and indexing algorithms in [10].In this paper we present fast �lters for searching allowing rotations under these three models.Our main results follow.� We give an O(n2=m) average time search algorithm for the exact model.� We present a �lter for searching under the mismatches model which is O(n2pk=m) averagetime if k < m2=(3 log�m)2. It still works better than previous work for higher k values,although the complexity changes. The idea is to reduce the problem to exact searching ofpieces of the pattern.� For the same model we present another �lter based on reducing the problem to inexactsearching of pattern pieces. The �lter is applicable for high k values levels and obtains acomplexity of O(n2k3=2plogm=m). For a range of k values, this is better than the algorithmof the previous item.� Finally, for the gray levels model we present a �lter based on coarsening the gray levels of theimage, which makes the problem independent on the number of gray levels, with a complexityapproaching that of the simpler mismatches model.2 De�nitionsLet T = T [1::n; 1::n] and P = P [1::m; 1::m] be arrays of unit squares, called cells, in the (x; y){plane. Each cell has a value in ordered �nite alphabet �. The size of the alphabet is denoted by� = j�j. The corners of the cell for T [i; j] are (i� 1; j� 1); (i; j� 1); (i� 1; j) and (i; j). The centerof the cell for T [i; j] is (i � 12 ; j � 12). The array of cells for pattern P is de�ned similarly. Thecenter of the whole pattern P is the center of the cell in the middle of P . Precisely, assuming forsimplicity that m is odd, the center of P is the center of cell P [m+12 ; m+12 ].Assume now that P has been moved on top of T using a rigid motion (translation and rotation),such that the center of P coincides exactly with the center of some cell of T . The location of P2

with respect to T can be uniquely given as ((i; j); �) where (i; j) is the cell of T that matches thecenter of P , and � is the angle between the x{axis of T and the x{axis of P . The (approximate)occurrence between T and P at some location is de�ned by comparing the values of the cells of Tand P that overlap. We will use the centers of the cells of T for selecting the comparison points.That is, for the pattern at location ((i; j); �), we look which cells of the pattern cover the centersof the cells of the text, and compare the corresponding values of those cells.More precisely, assume that P is at location ((i; j); �). For each cell T [r; s] of T whose centerbelong to the area covered by P , let P [r0; s0] be the cell of P such that the center of T [r; s] belongsto the area covered by P [r0; s0]. Then M(T [r; s]) = P [r0; s0]. So our algorithms compare the cellT [r; s] of T against the cell M(T [r; s]) of P .Hence the matching function M is a function from the cells of T to the cells of P . Now considerwhat happens to M when angle � grows continuously, starting from � = 0. Function M changesonly at the values of � such that some cell center of T hits some cell boundary of P . It was shownin [7] that this happens O(m3) times, when P rotates full 360 degrees, so there are O(m3) relevantorientations of P to be checked. The set of angles for 0 � � � �=2 isA = f�; �=2� � j � = arcsin h + 12pi2 + j2 � arcsin jpi2 + j2 ;i = 1; 2; : : : ; bm=2c; j = 0; 1; : : : ; bm=2c; h = 0; 1; : : : ; bqi2 + j2cg:By symmetry, the set of possible angles �, 0 � � < 2�, isA = A [ A+ �=2 [ A+ � [ A+ 3�=2:x

y’

x’

j

α

y

(0,0)

i

α α αFigure 1: Each text cell is matched against the pattern cell that covers the center of the text cell.For each angle �, a set of features is read from P .As shown in [7], any match of a pattern P in a text T allowing arbitrary rotations must containa so-called \feature", i.e. a one{dimensional string obtained by reading a line of the pattern insome angle and crossing the center. These features are used to build a �lter for �nding the positionand orientation of P in T .We now de�ne a set of linear features (strings) for P (see Figure 1). The length of a particularfeature is denoted by u, and the feature for angle � and row q is denoted by F q(�). Assume forsimplicity that u is odd. To read a feature F q(�) from P , let P be on top of T , on location ((i; j); �).3

Consider the cell T [i� m+12 + q; j� u�12 ]; : : : ; T [i� m+12 + q; j+ u�12 ]. Denote them as tq1; tq2; : : : ; tqu.Let cqi be the value of the cell of P that covers the center of tqi . The (horizontal) feature of P withangle � and row q is now the sequence F q(�) = cq1cq2 � � �cqu. Note that this value depends only on q,� and P , not on T .The sets of angles for the features is obtained the same way as the set of angles for the wholepattern P . Note that the set of angles Bq for the feature set F q is subset of A, that is Bq � Afor any q. The size of B varies from O(u2) (the features crossing the center of P ) to O(um) (thefeatures at distance �(m) from the center of P ). Therefore, if a match of some feature F q(�) isfound, there are O(jAj=jBqj) possible orientations to be veri�ed for an occurrence of P . In otherwords, the matching function M can change as long as F q(�) does not change.More precisely, assume that Bq = ( 1; : : : ; K), and that i < i+1. Therefore, feature F q( i) =F q(�) can be read using any � such that i � � < i+1. On the other hand, there are O(jAj=jBq)jangles � 2 A such that i � � < i+1. If there is an occurrence of F q(�), then P may occur withsuch angles �.3 Exact Search Allowing RotationsIn [7] only a set of features crossing the center of P and of length m is extracted from P , i.e.q = m+12 and u = m. The text is then scanned row{wise for the occurrence of some feature, andupon such an occurrence the whole pattern is checked in the appropriate angles.The veri�cation takes O(m) time on average and O(m2) in the worst case in [7]. The reason isthat there are O(m2) cells in the pattern, and each one intersects O(m) di�erent text centers alonga rotation of 360 degrees (or any other constant angle), so there are O(m3) di�erent rotations forthe pattern. The number of relevant rotations for a feature of O(m) cells is, however, only O(m2),and therefore there are O(m) di�erent angles in which the pattern has to be tested for each anglein which a feature is found.In [8] the possibility of using features of length u � m is considered, since it reduces the spaceand number of rotations. In what follows we assume that the features are of length u � m, and�nd later the optimal u.We show now how to improve both search and veri�cation time.3.1 Faster SearchFollowing [3, 4] we propose to search more features of the pattern to reduce the number of textrows to consider. This has obvious advantages since the search time per character is independenton the number of patterns if an Aho{Corasick machine (AC) [1] is used. In [4] a 2{dimensionalsearch algorithm (not allowing rotations) is proposed by searching all the pattern rows in the text,so that only the text rows multiples of m need to be considered because one of them must containsome pattern row in any occurrence.We take a similar approach. Instead of taking the O(u2) features that cross the center of thepattern, we also take some not crossing the center. More speci�cally, we take features for q in therange m�r2 +1 : : : m+r2 , where r is an odd integer for simplicity. For each such q, we read the featuresat all the relevant rotations. Figure 1 illustrates. This allows us to search only one out of r textrows, but there are O(r2u) features now. Figure 1 also shows that the features may become shorter4

thanm when they are far away from the center and the pattern is rotated. On the other hand, thereis no need to take features farther away from m=2 from the center, since in the case of unrotatedpatterns this is the limit. Therefore we have the limit r � m. If we take features from r = m rowsthen the shortest ones (for the pattern rotated at 45 degrees) are of length (p2� 1)m = �(m).The features do not cross the pattern center now, but they are still �xed if the pattern centermatches a text center.3.2 Faster Veri�cationWe show how veri�cations can be performed faster, in O(1) time instead of O(m). Imagine that afeature taken at angle � has been found in the text. Since the feature has length u and can be atdistance at most r from the center, there at most O(ur) di�erent angles, whose limits we call 1 to K , and we have i � � < i+1.We �rst try to extend the match of the feature to a match of the complete rotated row ofthe pattern. There are O(m2=(ur)) possible angles for the complete row, which lie between iand i+1 (as the feature is enlarged, the matching angles are re�ned). However, we perform thecomparison incrementally: �rst try to extend the feature by 1 cell. There are O(((u+1)r)=(ur)) =O((u + 1)=u) = O(1) possible angles, and all them are tried. The probability that the (u + 1)-thcell matches in some of the O(1) permitted angles is O(1=�). Only if we succeed we try with the(u+ 2)-th cell, where there would be O(((u+ 2)r)=((u+ 1)r)) di�erent angles, and so on.In general, the probability of checking the (u + i + 1)-th cell of the feature is that of passingthe check for the (u+ 1)-th, then that of the (u+ 2)-th and so on. The average number of times itoccurs is at most�u+ 1u � 1� � �u+ 2u+ 1� 1� � ::: � � u+ iu+ i� 1� 1� = �u+ iu � 1�iand by summing for i = 0 to �(m)� u we obtain O(1). This is done in both directions from thecenter, in any order.The same scheme can be applied to extend the match to the rest of the pattern. Each timewe add a new pattern position to the comparison we have only O(1) di�erent angles to test, andtherefore an O(1=�) probability of success. The process is geometric and it �nishes in O(1) timeon average.Note that this result holds even if the cell values are not uniformly distributed in the range1 : : :�. It is enough that there is an independent nonzero probability p of mismatch between arandom pattern cell and a random text cell, in which case 1=� is replaced by 1� p.3.3 AnalysisThe search time for the features is O(n2=r) since we inspect one text row out of r and the cost perinspected text cell is constant. The veri�cation time per feature that matches is O(1) as explained,and there are O(r2u=�u) features matching each inspected text position on average. This meansthat the total search cost is O(n2=r (1 + r2u=�u)) = O(n2 (1=r + ru=�u)). This shows that theoptimum is r = �(m) and u = x log�m with any x > 2, for a total search cost of O(n2=m). Sincethe same complexity is achieved for any large enough u, we prefer the minimumu = d(2+") log�me,since the space requirement for the AC machine is O(r2u) = O(m2 logm).5

Again, this analysis is valid for non{uniformly distributed cell values, by replacing 1=� by 1�p,where p is the probability of a mismatch.4 Search Allowing Rotations and MismatchesWe now present a 2D version of the incremental algorithm of [9], that runs in O(k3=2n2) averagetime, to search a pattern in a text allowing rotations and at most k mismatches.Assume that when computing the set of angles A = (�1; �2; : : :), we also sort the angles so that�i < �i+1, and associate with each angle �i the set Ci containing the corresponding cell centers thatmust hit a cell boundary at �i. Hence we can evaluate the number of mismatches for successiverotations of P incrementally. That is, assume that the number of mismatches has been evaluated for�i, then to evaluate the number of mismatches for rotation �i+1, it su�ces to re{evaluate the cellsrestricted to the set Ci. This is repeated for each � 2 A. Therefore, the total time for evaluating thenumber of mismatches for P centered at some position in T , for all possible angles, is O(Pi jCij).This is O(m3) because each �xed cell center of T , covered by P , can belong to some Ci at mostO(m) times. To see this, note that when P is rotated the whole angle 2�, any cell of P traversesthrough O(m) cells of T .Then consider the k mismatches problem. The expected number of mismatches in N tests isNp = N ��1� . Requiring that Np > k gives that about N > k=p tests should be enough in typicalcases to �nd out that the distance must be > k.This suggests an improved algorithm for the k{mismatches case. Instead of using the wholeP , select the smallest subpattern P 0 of P , with the same center cell, of size m0 � m0 such thatm0 � m0 > k=p. Then search P 0 to �nd if it matches with at most k mismatches. If so, thencheck with the gradually growing subpatterns P 00 whether or not P 00, matches, until P 00 = P . Ifnot, continue with P 0 at the next location of T . The expected running time of the algorithm isO(m03n2) which is O(k3=2n2).Note that this algorithm assumes nothing of how we compare the cell values, any other distancemeasure than counting the mismatches can be also used.We show now how to improve this time complexity.4.1 Reducing to Exact SearchingThe idea is to reduce the problem to an exact search problem. We cut the pattern in j pieces alongeach dimension, for j = bpkc + 1, thus obtaining j2 pieces of size (m=j)� (m=j). Now, in eachmatch with k di�erences or less necessarily one of those pieces is preserved without di�erences, sinceotherwise there should be at least one di�erence in each piece, for a total of j2 = (bpkc + 1)2 > kdi�erences overall. This fact was �rst utilized in [12, 13]. So we search for all the j2 pieces exactlyand check each occurrence for a complete match.Observe that this time the pieces cannot be searched using the center to center assumption,because this holds only for the whole pattern. However, what is really necessary is not that thepiece center is aligned to a text center, but just that there exists a �xed position to where thepiece center is aligned. Once we �x a rotation for the whole pattern, the matching function ofeach pattern piece gets �xed too. Moreover, from the O(m3) relevant rotations for the wholepattern, only O(mu) are relevant for each one-dimensional feature of length u. There is at most6

one matching function for each relevant rotation (otherwise we would have missed some relevantrotations). Hence we can work exactly as before when matching pieces, just keeping in mind thatthe alignment between the pattern center and the text center has to be shifted accordingly to theangle in which we are searching the feature. The same considerations of the previous section showthat we can do the veri�cation of each matching piece in O(1) time.The search algorithm can look for all the features of all the j2 patterns together, so the searchtime has two parts: the AC machine takes O(n2=(m=j)) time (since the pieces are of size (m=j)2);and the veri�cation of the whole piece once each feature is found takesO(1). Since there are j2 piecesof size (m=j)2, there are r = O(m=j) features, which when considering all their rotations make upO(j2(m=j)mu) features that can match. Hence the total veri�cation time is O(n2j2(m=j)mu=�u).Note that for each piece of length (m=j)2 there will be O(m(m=j)2) relevant rotations, because thepiece may be far away from the pattern center.Once an exact piece has been found (which happens with probability O(m(m=j)2=�m2=j2))we must check for the presence of the whole pattern with at most k di�erences. Although aftercomparing O(k) cells we will obtain a mismatch on average, we have to check for all the possiblerotations. A brute force checking of all the rotations gives m3=(m(m=j)2) = j2 checks, for a totalO(kj2) veri�cation time for each piece found.We can instead extend the valid rotations incrementally, by checking cells farther and fartheraway from the center and re�ning the relevant rotations at the same time. Unlike the case of exactsearching, we cannot discard a rotation until k di�erences are found, but the match will disappearon average after we consider O(k) extra cells at each rotation. Hence, we stop the veri�cation longbefore reaching all the O(m3) rotations.m/j

R

K

j=4

m/j

m/j

Pattern cut in 16 pieces A piece matched is extendeduntil finding k differencesFigure 2: On the left, the pattern is cut in j2 = 16 pieces. On the right, a piece of width m=jfound exactly is extended gradually until �nding k di�erences.Let K be a random variable counting the number of cells read until k di�erences are found alonga �xed rotation. We know that K = O(k). Since we enlarge the match of the piece by reading cellsat increasing distances from the center, by the point where we �nd k di�erences we will have covereda square of side R where R2�(m=j)2 = K (see Figure 2). The total number of rotations consideredup to that point is O(mR2=(m(m=j)2)) = O(1 +Kj2=m2). Since this is linear on K we can takethe function on the expectation K, so the average number of rotations considered until �nding7

more than k di�erences is O(1 + kj2=m2). We consider that we check all these rotations by bruteforce, making K = O(k) comparisons for each such rotation. Then the veri�cation cost per piece isO(k+k2j2=m2). This veri�cation has to be carried outO(j2m(m=j)2n2=�m2=j2) = O(m3n2=�m2=j2)times on average. Therefore the total search time is of the order ofn2 jm + j2(m=j)mu�u + km3�m2=j2 + k2j2m3m2�m2=j2 ! = n2 jm + jm2u�u + km3�m2=j2 + k2j2m�m2=j2 !where all the terms worsen as j grows. This is why we prefer to take the minimum possiblej = �(pk). Any u � x log�m for x > 3 yields optimal performance for u. The �rst term of theexpression dominates while the optimal u is feasible, i.e. for k � m2=(3 log�m)2(1 + o(1)), up towhere the whole scheme is O(n2pk=m) time. After that point we have to select maximal u = m=jand the whole scheme is O(n2m3=�m=pk) time for k � m2=(4 log�m)(1+o(1)). Finally, the schemeis O(n2k3m=�m2=k) for larger k.4.2 Reducing to Inexact SearchingSince the search time worsens with j we may try to use a smaller j, although this time the piecesmust be searched allowing some di�erences. More speci�cally, we must allow bk=j2c di�erences inthe pieces, since if there are more than bk=j2c di�erences in each piece then the total exceeds k.The O(k3=2n2) time incremental search algorithm can be used here. Since we search j2 pieceswith k=j2 di�erences, the total search cost for the pieces is O(n2j2(k=j2)3=2) = O(n2k3=2=j).However, the incremental algorithm assumes that the center of P coincides with some center ofthe cells of T , and this is not necessarily true when searching pieces. We now present a �lter thatgives a lower bound for the number of mismatches.Assume that P is at some location ((v; w); �) on top of T , such that (u; v) is not a center{to{center translation, such that (u; v) 2 T [i; j], and that the number of mismatches is k for thatposition of P . Then assume that P is translated to ((i; j); �), that is, center{to{center becomestrue while the rotation angle stays the same. As a consequence, some cell centers of T may havemoved to the cell of P that is one of its eight neighbors. Now compute the number of mismatchessuch that T [r; s] is compared againstM(T [r; s]) and its eight neighbors as well. If any of those ninecells match with T [r; s], then we count a match, otherwise we count a mismatch. Let the numberof mismatches obtained this way be k0.This means that k0 � k, because all matches that contribute to m2 � k must be present inm2� k0 too. The value of k0 can be evaluated with the incremental technique using �s instead of �where s is such that �s � � < �s+1, because the matching functions are the same for � and �s byour construction. Hence k0 � k.Hence we use the algorithm with the center{to{center assumption, but count a mismatch onlywhen the text cells di�ers from all the 9 pattern cells that surround the one it matches with. Thenet result in e�ciency is that the alphabet size becomes � = 1=(1� (1 � 1=�)9), meaning that acell matches with probability 1=� � 9=�.For the veri�cation cost of the pieces, we need to know the probability of a match with kdi�erences. Since we can choose the mismatching positions and the rest must be equal to thepattern, the probability of a match is � �m2k �=�m2�k . By using Stirling's approximation to the8

factorial and calling � = k=m2, we have that the probability can be bounded by m2=m, where = 1=(��=(1��)(1��)�)1�� (e=((1��)�))1��. This improves as m grows and � stays constant.On the other hand, � < 1� e=� is required so that < 1.The situation is a little di�erent from that of Section 4.1. In that case we produce all theO(m3=j2) rotations plus displacements and search them all with Aho-Corasick. In this case wesearch the pieces with a di�erent routine that �nds them even without the center{to{center as-sumption and then we have to take care of the center displacements.The probability of �nding a given piece, with any rotation, is O( (m2=j2)=(m=j)� (m=j)3). Ifwe multiply this by the j2 pieces that are searched, we get that the number of times we will haveto verify a piece is O(m2 (m2=j2)).Let us now consider the cost to verify a piece. Unlike Section 4.1, we have to restart theveri�cation from the center of the piece because we do not know exactly how many mismatchesare there. The number of rotations plus displacements to consider from the center is O(m3=j2).So the idea is to draw a spiral veri�cation, where after having read ` cells from the center we haveconsidered O(m`2) rotations and displacements.On the other hand, the area is not random, because it has been already pointed out as acandidate area. This means that each pattern cell matches with some of its 9 neighbors. Theprobability that, as we �x the rotations and displacements, the pattern cell matches the right textcell is � = �=� < 1 (indeed, � � 1=9 for large �). Hence, since the probability of matching ` cellsis still exponentially decreasing with `, we have that anyway the check will end after we considerO(k) cells of the piece in our spiral veri�cation. This holds true after the spiral covers all the piecethat matched and continues with the rest of the pattern.Relating the information on rotations delivered by the routine with the rotations plus displace-ments that we �nd now is complex, so for simplicity we assume that we get no information onrotations. The total amount of work in checking all the rotations and displacements until thematch disappears is O(mk3). Multiplying this by the probability of verifying a piece gives us atotal search cost of O �n2k3=2 �1=j +m3k3=2 m2=j2��whose optimum is at j = m=q4 log1= m+ 3=2 log1= k(1 + o(1)), that can be achieved wheneverit is smaller than pk, i.e. for k > m2=(7 log1= m)(1 + o(1)). For large � this is k > m2(1 �log� 9)=(7 log�m)(1+ o(1)). For smaller k the scheme reduces to exact searching and the previoustechnique applies. For this optimum value the complexity is O(n2k3=2qlog1= m=m).This competes with the reduction to exact searching for high values of k. Reducing to inexactsearching is indeed better than the third complexity obtained in Section 4.1, O(n2k3m=�m2=k),for k > m2=(5 log�m)(1 + o(1)), which is totally contained in the area where reduction to inexactsearching can be applied. Hence, in this area we should always use reduction to inexact searching. Ifwe compare against the second complexity of Section 4.1, O(n2m3=�m=pk), we have that reductionto inexact searching is better for k > (m= log�m)2(1+o(1)). This area extends that where reductionto inexact searching can be applied.Figure 3 shows the complexities achieved. 9

Reduction to exact partitioningReduction to inexact partitioningtimen2 kba0 m2cpk=m a = m2=(3 log�m)2b = m2=(7 log1= m)c = m2=(4 log�m)k3=2plogm=mm3=�m=pk k3m=�m2=kFigure 3: The complexities obtained for the mismatches model depending on k.5 Searching under the Gray Levels ModelIn principle any result for the mismatches model holds for this as well, because if a pattern matcheswith total color di�erence k then it also matches with k mismatches. However, the typical k valuesare much larger in this model, so using the same algorithms as �lters is not e�ective if a naiveapproach is taken.In this case, we can improve the search by reducing the number of di�erent colors, i.e. mappings consecutive colors into a single one. The e�ect is that � is reduced to �=s and k is reduced to1 + bk=sc = �(k=s) too. For instance, if we consider reduction to exact searching, the schemeis O(n2pk=m) time for k < m2=(3 log�m)2. This becomes now O(n2pk=s=m) time for k=s <m2=(3 log�=sm)2. For example binarizing the image means s = �=2 and gives a search time ofO(n2pk=�=m) for k < m2=((3 log2m)2�).This seems to show that the best is to maximize s, but the price is that we now have to checkthe potential matches found, because some may not really satisfy the matching criterion on theoriginal gray levels. The probability of �nding a match with the reduced alphabet isO(m3�m2=m) =O(m2�m2) times, where � = 1=(��=(1��)(1 � �)�=s)1�� (e=((1 � �)�=s))1�� and � = �=s =(k=s)=m2 (similar to in Section 4.2). After a complete match with reduced alphabet is found wehave to check for a real match, which costs O(1).It is clear that this �nal veri�cation is negligible as long as � < 1. The maximum s satisfyingthis is (� + p�2 � 4e��)=(2e) = �(�). The search cost then becomes O(n2pk=�=m) for k <�(m2�= log2m). This means that if we double the number of gray levels and consequently doublek, we can keep the same performance by doubling s.For higher error levels, partitioning into exact searching improves worsens if we divide k and� by s, so the scheme is applicable only for k < �(m2�= log2m). However, it is possible to resortto reduction to inexact matching, using the O((k=�)3=2n2) average time algorithm for this model.This cost improves as we increase s, and hence we can obtain O(n2(k=�)3=2qlog1=�m=m) time for�(m2�= log�m) < k < �(m2�). For lower k we cannot use the optimum j, so we have to setj = pk to obtain O(n2k=�) time. 10

6 Conclusions and Future WorkWe have presented di�erent alternatives to speed up the search of two dimensional patterns in twodimensional texts allowing rotations and di�erences. Table 1 shows our main achievements (all areon the average, we have no new results on the worst case).Model Previous result Our resultsExact matching O(n2) O(n2=m)O(n2pk=m), k < m2=(3 log�m)2k Mismatches O(n2k3=2) O(n2m3=�m=pk), k < m2=(7 log1= m)O(n2k3=2plogm=m), k < m2 (1� e=�(�))O(n2pk=�=m), k < �(m2�= log2�m)Gray levels O(n2(k=�)3=2) O(n2k=�), k < �(m2�= log�m)O(n2(k=�)3=2plogm=m), k < �(m2�)Table 1: The (simpli�ed) average case complexities achieved for di�erent models.The results can be extended to more dimensions. In three{dimensions there are O(m11) di�erentrotations for P [9], and O(um2) features of length u. However, the three{dimensional text mustbe scanned in two directions, e.g. along the x{axis and along the y{axis, to �nd out the candidaterotations for P . Only if two features are found (that suggest the same center position of P in T ),we enter the veri�cation, see Figure 4. For the exact matching, the method works in O(n3=m2)average time. The other results can be extended also.z

x

y

T:

P:

P:

Figure 4: Matching features in 3D.It is also possible to make the veri�cation probability lower by requiring that several piecesmust match before going to the veri�cation. This means smaller pieces or more di�erences allowedfor the pieces. It is also possible to scan the text in two (in 2D) or in three (in 3D) ways insteadof only one or two, using the same set of features than in the basic algorithm.11

Note also that, until now, we have assumed that the center of P must be exactly on top of somecenter of the cells of T . It is also possible to remove this restriction, but the number of matchingfunctions (and therefore the number of features) grow accordingly, see [9]. This, however, does nota�ect the �ltering time, but the veri�cation for the approximate matching would be slower.Finally, we have considered an error model where only \substitutions" are permitted, i.e. acell value changes its value in order to match another cell, so we substitute up to k values inthe text occurrence and obtain the pattern. More sophisticated error models exist which permitdisplacements (such as inserting/deleting rows/columns) in the occurrences, and search algorithmsfor those models (albeit with no rotations) have been developed for two and more dimensions [3].It would be interesting to combine the ability to manage those types of errors and rotations at thesame time.References[1] A. Aho and M. Corasick. E�cient string matching: an aid to bibliographic search. CACM,18(6):333{340, June 1975.[2] A. Amir, G. Benson, and M. Farach. An alphabet independent approach to two-dimensionalpattern matching. SIAM J. Comput., 23(2):313{323, 1994.[3] R. Baeza-Yates and G. Navarro. New models and algorithms for multidimensional approximatepattern matching. Journal of Discrete Algorithms (JDA), 1(1):21{49, 2000. Special issue onMatching Patterns.[4] R. Baeza-Yates and M. R�egnier. Fast two dimensional pattern matching. Information Process-ing Letters, 45:51{57, 1993.[5] Lisa Gottesfeld Brown. A survey of image registration techniques. ACM Computing Surveys,24(4):325{376, December 1992.[6] K. Fredriksson. Rotation invariant histogram �lters for similarity and distance measures be-tween digital images. In Proceedings of the 7th International Symposium on String Processingand Information Retrieval (SPIRE'2000), pages 105{115. IEEE CS Press, 2000.[7] K. Fredriksson and E. Ukkonen. A rotation invariant �lter for two-dimensional string matching.In Proc. CPM'98, LNCS 1448, pages 118{125. Springer{Verlag, 1998.[8] K. Fredriksson and E. Ukkonen. Combinatorial methods for approximate image matchingunder translations and rotations. Pattern Recognition Letters, 20(11{13):1249{1258, 1999.[9] K. Fredriksson and E. Ukkonen. Combinatorial methods for approximate pattern matchingunder rotations and translations in 3d arrays. In Proceedings of the 7th International Sympo-sium on String Processing and Information Retrieval (SPIRE'2000), pages 96{104. IEEE CSPress, 2000. 12

[10] G. Navarro K. Fredriksson and E. Ukkonen. An index for two dimensional string matchingallowing rotations. In J. van Leeuwen, O. Watanabe, M. Hagiya, P.D Mosses, and T. Ito,editors, Proceedings of the 1st IFIP International Conference on Theoretical Computer Science(IFIP TCS2000), LNCS 1872, pages 59{75. Springer{Verlag, 2000.[11] J. K�arkk�ainen and E. Ukkonen. Two- and higher-dimensional pattern matching in optimalexpected time. SIAM J. Comput., 29(2):571{589, 2000.[12] R. L. Rivest. Partial-match retrieval algorithms. SIAM J. Comput., 5(1):19{50, 1976.[13] S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83{91,1992.

13

Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations

Documents

Transcript of Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations