LSH for similarity search in generic metric space

47
LSH for similarity search in generic metric space Eliezer de Souza da Silva Department of Computer Engineering and Industrial Automation School of Electrical and Computer Engineering University of Campinas [email protected] Wednesday 8 th October, 2014

Transcript of LSH for similarity search in generic metric space

LSH for similarity search in generic metric space

Eliezer de Souza da Silva

Department of Computer Engineering and Industrial AutomationSchool of Electrical and Computer Engineering

University of [email protected]

Wednesday 8th October, 2014

Basic Concepts and Research Review

Similarity Search – metric space model

Generic model for proximity search;Tuple (U,d), where U is a set and d a distance function (positive,symmetric);∀x , y , z ∈ U, d(x , y) ≤ d(x , z) + d(z, y) (triangle inequality);

E.S. Silva () Metric LSH Wednesday 8th October, 2014 2 / 44

Basic Concepts and Research Review Locality sensitive hashing

Locality-sensitive hashing

Definition

Given a distance function d : X × X → R+, a function familyH = h : X → C is (r , cr , p1, p2)-sensitive for a given data set S ⊆ X if,for any points p,q ∈ S, h ∈ H:

If d(p,q) ≤ r then PrH [h(q) = h(p)] ≥ p1 (probability of collidingwithin the ball of radius r),If d(p,q) > cr then PrH [h(q) = h(p)] ≤ p2 (probability of collidingoutside the ball of radius cr)c > 1 and p1 > p2

E.S. Silva () Metric LSH Wednesday 8th October, 2014 3 / 44

Basic Concepts and Research Review Locality sensitive hashing

Locality-sensitive hashing

qr

crp

p'

Figure: LSH and (R, c)-NNE.S. Silva () Metric LSH Wednesday 8th October, 2014 4 / 44

Basic Concepts and Research Review Locality sensitive hashing

Quantizers

Data-dependent quantization has the advantage of more regularpopulation of points in each bucket and empirically performs betterthan regular schemes [50]

E.S. Silva () Metric LSH Wednesday 8th October, 2014 5 / 44

Basic Concepts and Research Review Locality sensitive hashing

Existing LSH in General Metric Spaces

Novak et al. [41; 42]: M-Index : constructs a hierarchy of partitioningof the dataset choosing points from the dataset as cluster centers.Kang and Jung [28]: DFLSH (Distribution Free Locality-SensitiveHashing): randomly choose t points from the original dataset (withn > t points) as centroids and index the dataset using the nearestcentroid as hash key – this construction yields an approximatelyuniform number of points-per-bucket: O(n/t).Tellez and Chavez [59]: map metric data to a permutation index,encode permutation in hamming space and use Hamming LSH.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 6 / 44

Towards LSH in generic metric space VoronoiLSH

VoronoiLSH - Hashing function

Generate L induced Voronoi

Partitioning

L hash tables

h1 hL...[ ]

L associated hash functions ...

Definition

Given a metric space (U,d), C = c1, . . . , ck ⊂ U and x ∈ U:

hC : U → NhC(x) = argmini=1,...,kd(x , ci)

(1)

E.S. Silva () Metric LSH Wednesday 8th October, 2014 7 / 44

Towards LSH in generic metric space VoronoiLSH

VoronoiLSH

C1

C2

C3

qr

cr

Zqp

Zp

d(q,p)

h(q)=h(p)=2

p'

h(p')=3

Zp'

E.S. Silva () Metric LSH Wednesday 8th October, 2014 8 / 44

Towards LSH in generic metric space VoronoiLSH

Performance and Cost Models

Range CostRC(n, k) =

nk

+ k

⇒ RC(n) = 2√

n

NN CostNNC(n, k ,d) =

nk

log(nk

) + dnk

+ dk

⇒ NNCopt (n,d) = O(

√nd(log(

√n) + d + 1)

E.S. Silva () Metric LSH Wednesday 8th October, 2014 9 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

Probability model: (Ω,F ,Pr)

Zp = d(p,NNC(p)) = d(p,C)

Ω = Zx |x ∈ X ,C ⊂ XPr [hC(p) 6= hC(q)] = Pr [Zq < d(q,NNC(p) ∩ Zp <d(p,NNC(q)]

pq

NNC(p)

NNC(q)

E.S. Silva () Metric LSH Wednesday 8th October, 2014 10 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

d(p,q) > cr

Zp + Zq < cr ⊆ Zp + Zq < d(p,q)

Zp + Zq < d(p,q) ⊆ Zq < d(q,NNC(p) ∩ Zp < d(p,NNC(q)

⇒ Pr [hC(p) 6= hC(q)] ≥ Pr [Zq + Zp < cr ]

⇒ Pr [hC(p) = hC(q)] ≤ Pr [Zq + Zp ≥ cr ] = p2

E.S. Silva () Metric LSH Wednesday 8th October, 2014 11 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

d(p,q) > cr

Zp + Zq < cr ⊆ Zp + Zq < d(p,q)

Zp + Zq < d(p,q) ⊆ Zq < d(q,NNC(p) ∩ Zp < d(p,NNC(q)

⇒ Pr [hC(p) 6= hC(q)] ≥ Pr [Zq + Zp < cr ]

⇒ Pr [hC(p) = hC(q)] ≤ Pr [Zq + Zp ≥ cr ] = p2

E.S. Silva () Metric LSH Wednesday 8th October, 2014 11 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

d(p,q) < r

d(p,NNC(q)) ≤ d(p,q) + Zq ≤ r + Zq

d(q,NNC(p)) ≤ d(p,q) + Zp ≤ r + Zp

⇒ Zp < d(p,NNC(q) ⊆ Zp < r + Zq

⇒ Zq < d(q,NNC(p) ⊆ Zq < r + Zp

⇒ Pr [hC(p) 6= hC(q)] ≤ Pr [|Zq − Zp| < r ]

⇒ Pr [hC(p) = hC(q)] ≥ Pr [|Zq − Zp| ≥ r ] = p1

E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

d(p,q) < r

d(p,NNC(q)) ≤ d(p,q) + Zq ≤ r + Zq

d(q,NNC(p)) ≤ d(p,q) + Zp ≤ r + Zp

⇒ Zp < d(p,NNC(q) ⊆ Zp < r + Zq

⇒ Zq < d(q,NNC(p) ⊆ Zq < r + Zp

⇒ Pr [hC(p) 6= hC(q)] ≤ Pr [|Zq − Zp| < r ]

⇒ Pr [hC(p) = hC(q)] ≥ Pr [|Zq − Zp| ≥ r ] = p1

E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

d(p,q) < r

d(p,NNC(q)) ≤ d(p,q) + Zq ≤ r + Zq

d(q,NNC(p)) ≤ d(p,q) + Zp ≤ r + Zp

⇒ Zp < d(p,NNC(q) ⊆ Zp < r + Zq

⇒ Zq < d(q,NNC(p) ⊆ Zq < r + Zp

⇒ Pr [hC(p) 6= hC(q)] ≤ Pr [|Zq − Zp| < r ]

⇒ Pr [hC(p) = hC(q)] ≥ Pr [|Zq − Zp| ≥ r ] = p1

E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44

Towards LSH in generic metric space VoronoiLSH

Hash probabilities bounds

p1 ≥ p2: needs two assumptions, “Zq < δr ” (δ > 0) and“c > 2δ + 1”;p1 > p2: needs consider a hypothetical case where “Zq = r − ε”and “Zp = 2δr − ε”, for ε > 0.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 13 / 44

Towards LSH in generic metric space VoronoiPlexLSH

VoronoiPlex LSH - Hash function construction

Multiple VoronoiLSH with a controlled number of distance computation

input : size k of the sample, number of distinct partitioning w , andinteger number of centroidsp

output: A hash function hk ,w ,pselected← new binary array of size k ;subsample← new integer multi-array of size w × p;for j ← 1 to w do

Random sample S = s1, · · · , sp from 1, · · · , k;for i ← 1 to p do

subsample[j , i]← si ;selected[si ]← 1;

endendhk ,w ,p ← (selected,subsample) ;

Algorithm 1: Hash function building

E.S. Silva () Metric LSH Wednesday 8th October, 2014 14 / 44

Towards LSH in generic metric space VoronoiPlexLSH

VoronoiPlex LSH - Hashing algorithm

input : Hash function object hk ,w ,p,Sample C = c1, . . . , ck ⊂ X(|C| = k ) and a point q ∈ X

output: Integer value hk ,w ,p(q)(selected,subsample)← retrieved from hk ,w ,p distances← newfloating-point array of size k ;for j ← 1 to k do

if selected[j] == 1 thendistances[j]← d(q, cj) ;

endendhasharray← new integer array of size w ;for i ← 1 to w do

hasharray[i]← element in subsample[i] that minimize distances[j](varying j) ;

endhk ,w ,p(q)← hash(hasharray) ;

Algorithm 2: Hash function ApplicationE.S. Silva () Metric LSH Wednesday 8th October, 2014 15 / 44

Towards LSH in generic metric space VoronoiPlexLSH

VoronoiPlex LSH

1 2 5 2

c1c2 c3 c4c5

c1c3c4

c3c5c2

c5c1c3

c5c4c2

h5,4,3= h5,4,3(p)=

IEi=1,··· ,k [selected[i] = 1] = k − k(1− pk )w

O(k − kε) number of distance computation (intrinsic cost)a more complicated analysis for the extrinsic cost

E.S. Silva () Metric LSH Wednesday 8th October, 2014 16 / 44

Towards LSH in generic metric space Parallel VoronoiLSH

Parallel VoronoiLSH

Dataflow programming distributed computation;Computing stages distributed in processors and nodes;Message-passing interface.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 17 / 44

Results Datasets

Datasets

APM (Arquivo Público Mineiro – The Public Archives in MinasGerais)

2.871.300 feature vectors (SIFT descriptor is a 128 dimensionalvector).queries dataset: 263.968 feature vectors with ground-truth.For the experiments we used 5000 queries uniformly sampled fromthe query dataset and performed a 10-NN search.

Metric datasets: Listeria (20660/ 100) and English (66069 / 500 )dictionary;BigANN (1B) for large scale experiments: (109 / 104).

E.S. Silva () Metric LSH Wednesday 8th October, 2014 18 / 44

Results Experimental results

APM

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.001 0.01 0.1 1

recall

extensiveness

DFLSHK-MedoidsLSHK-MeansLSH

(a) Recall x Extensiveness (log scale)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 0.005 0.01 0.015 0.02 0.025 0.03recall

extensiveness

DFLSHK-MedoidsLSHK-MeansLSH

L=1

L=5

L=8

(b) Recall x Number of hash functions L,Extensiveness (for 5000 cluster centers)

E.S. Silva () Metric LSH Wednesday 8th October, 2014 19 / 44

Results Experimental results

English dataset - VoronoiLSH and BPI

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

fraction of query time of linear scan

0.70

0.75

0.80

0.85

0.90

0.95

1.00re

call

Voronoi LSH with K-means++, L=5

DFLSH, L=5

Voronoi LSH with K-means++, L=8

DFLSH, L=8

Brief Proximity Index (BPI) LSH

Figure: Recall for Voronoi LSH and BPI LSH

E.S. Silva () Metric LSH Wednesday 8th October, 2014 20 / 44

Results Experimental results

Listeria

0.85

0.86

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.011

reca

ll

extensivity

DFLSH L=2DFLSH L=3

(a) Recall x Extensivity

0.00 0.05 0.10 0.15 0.20 0.25

extensivity

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

reca

ll

w=2

w=5

w=10

W=2W=5

W=10

VoronoiPlex LSH for L=1,nCluster=10

VoronoiPlex LSH for L=8,nCluster=10

(b) varying the size w of the key-length (10centroids selected from a 4000 point sampleset)

E.S. Silva () Metric LSH Wednesday 8th October, 2014 21 / 44

Results Experimental results

Large scale experiment

(c) Query time / Recall (d) Parallel efficiency

E.S. Silva () Metric LSH Wednesday 8th October, 2014 22 / 44

Conclusions

Results and challenges

Using metric partitioning techniques for hashing functions in metricspace is a valid technique and should be further explored anddeveloped;The experiments do not show any clear advantage in learning theseeds of the Voronoi diagram by clustering;It would be interesting to equip the analysis with more assumptionsof the data;

E.S. Silva () Metric LSH Wednesday 8th October, 2014 23 / 44

References

References I

[1] Fernando Akune. Indexação Multimídia escalável e busca porsimilaridade em alta dimensionalidade. M. sc., UniversidadeEstadual de Campinas (Unicamp), 2011.

[2] Fernando Akune, Eduardo Valle, and Ricardo Torres. MONORAIL:A Disk-Friendly Index for Huge Descriptor Databases. In 2010 20thInternational Conference on Pattern Recognition, pages4145–4148. IEEE, August 2010.

[3] Alexandr Andoni and Piotr Indyk. Near-Optimal HashingAlgorithms for Approximate Nearest Neighbor in High Dimensions.2006 47th Annual IEEE Symposium on Foundations of ComputerScience (FOCS’06), pages 459–468, 2006.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 24 / 44

References

References II

[4] David Arthur and Sergei Vassilvitskii. k-means++: the advantagesof careful seeding. In Proceedings of the eighteenth annualACM-SIAM symposium on Discrete algorithms, SODA ’07, pages1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial andApplied Mathematics.

[5] Sunil Arya and David M. Mount. Approximate nearest neighborqueries in fixed dimensions. In Proceedings of the fourth annualACM-SIAM Symposium on Discrete algorithms, SODA ’93, pages271–280, Philadelphia, PA, USA, 1993. Society for Industrial andApplied Mathematics.

[6] Bahman Bahmani, Ashish Goel, and Rajendra Shinde. Efficientdistributed locality sensitive hashing. In Proceedings of the 21stACM International Conference on Information and KnowledgeManagement, CIKM ’12, pages 2174–2178, New York, NY, USA,2012. ACM.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 25 / 44

References

References III

[7] Mayank Bawa, Tyson Condie, and Prasanna Ganesan. LSH forest.In Proceedings of the 14th international conference on World WideWeb - WWW ’05, page 651, New York, New York, USA, 2005. ACMPress.

[8] R.E. Bellman. Dynamic Programming. Dover Books on ComputerScience Series. Dover Publications, Incorporated, 2003.

[9] Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. Thex-tree: An index structure for high-dimensional data. InProceedings of the 22th International Conference on Very LargeData Bases, VLDB ’96, pages 28–39, San Francisco, CA, USA,1996. Morgan Kaufmann Publishers Inc.

[10] Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Chialin Chang,Alan Sussman, and Joel Saltz. Distributed processing of very largedatasets with DataCutter. Parallel Comput., 27(11):1457–1478,2001.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 26 / 44

References

References IV

[11] Christian Böhm, Stefan Berchtold, and Daniel A. Keim. Searchingin high-dimensional spaces: Index structures for improving theperformance of multimedia databases. ACM Computing Surveys,33(3):322–373, September 2001.

[12] W. A. Burkhard and R. M. Keller. Some approaches to best-matchfile searching. Commun. ACM, 16(4):230–236, April 1973.

[13] Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, andJosé Luis Marroquín. Searching in metric spaces. ACM ComputingSurveys, 33(3):273–321, September 2001.

[14] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: Anefficient access method for similarity search in metric spaces. InProceedings of the 23rd International Conference on Very LargeData Bases, VLDB ’97, pages 426–435, San Francisco, CA, USA,1997. Morgan Kaufmann Publishers Inc.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 27 / 44

References

References V

[15] Kenneth L Clarkson. Nearest-Neighbor Searching and MetricSpace Dimensions. In Gregory Shakhnarovich, Trevor Darrell, andPiotr Indyk, editors, Nearest-Neighbor Methods in Learning andVision: Theory and Practice (Neural Information Processing),Advances in Neural Information Processing Systems. The MITPress, 2006.

[16] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni.Locality-sensitive hashing scheme based on p-stable distributions.In Proceedings of the twentieth annual symposium onComputational geometry - SCG ’04, page 253, New York, NewYork, USA, 2004. ACM Press.

[17] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. Imageretrieval. ACM Computing Surveys, 40(2):1–60, April 2008.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 28 / 44

References

References VI

[18] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similaritysearch and classification via rank aggregation. In Proceedings ofthe 2003 ACM SIGMOD international conference on Managementof data, SIGMOD ’03, pages 301–312, New York, NY, USA, 2003.ACM.

[19] C. Faloutsos and S. Roseman. Fractals for secondary key retrieval.In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGARTsymposium on Principles of database systems, PODS ’89, pages247–252, New York, NY, USA, 1989. ACM.

[20] Christos Faloutsos. Multiattribute hashing using gray codes.SIGMOD Rec., 15(2):227–238, June 1986.

[21] Volker Gaede and Oliver Günther. Multidimensional accessmethods. ACM Computing Surveys, 30(2):170–231, June 1998.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 29 / 44

References

References VII

[22] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similaritysearch in high dimensions via hashing. In Proceedings of the 25thInternational Conference on Very Large Data Bases, VLDB ’99,pages 518–529, San Francisco, CA, USA, 1999. MorganKaufmann Publishers Inc.

[23] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors.In Proceedings of the thirtieth annual ACM symposium on Theoryof computing - STOC ’98, pages 604–613, New York, New York,USA, 1998. ACM Press.

[24] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review.ACM Computing Surveys, 31(3):264–323, September 1999.

[25] H. Jegou, M. Douze, and C. Schmid. Product quantization fornearest neighbor search. IEEE Transactions on Pattern Analysisand Machine Intelligence,, 33(1):117–128, 2011.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 30 / 44

References

References VIII

[26] Herve Jegou, Laurent Amsaleg, Cordelia Schmid, and Patrick Gros.Query adaptative locality sensitive hashing. In 2008 IEEEInternational Conference on Acoustics, Speech and SignalProcessing, pages 825–828. IEEE, March 2008.

[27] Alexis Joly and Olivier Buisson. A posteriori multi-probe localitysensitive hashing. In Proceeding of the 16th ACM internationalconference on Multimedia - MM ’08, page 209, New York, NewYork, USA, 2008. ACM Press.

[28] Byungkon Kang and Kyomin Jung. Robust and Efficient LocalitySensitive Hashing for Nearest Neighbor Search in Large Data Sets.In NIPS Workshop on Big Learning (BigLearn), pages 1–8, LakeTahoe, Nevada, 2012.

[29] Leonard Kaufman and Peter J. Rousseeuw. Finding Groups inData: An Introduction to Cluster Analysis. Wiley-Interscience, 9thedition, March 1990.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 31 / 44

References

References IX

[30] Jon M. Kleinberg. Two algorithms for nearest-neighbor search inhigh dimensions. In Proceedings of the twenty-ninth annual ACMsymposium on Theory of computing, STOC ’97, pages 599–608,New York, NY, USA, 1997. ACM.

[31] Martin Kruliš, Tomáš Skopal, Jakub Lokoc, and Christian Beecks.Combining cpu and gpu architectures for fast similarity search.Distributed and Parallel Databases, 30(3-4):179–207, 2012.

[32] John Leech. Some sphere packings in higher space. CanadianJournal of Mathematics, 16:657–682, January 1964.

[33] Herwig Lejsek, Fridrik Heidar Ásmundsson, Björn THór Jónsson,and Laurent Amsaleg. Efficient and effective image copyrightenforcement. In BDA, 2005.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 32 / 44

References

References X

[34] S. Liao, M.a. Lopez, and S.T. Leutenegger. High dimensionalsimilarity search with space filling curves. In Proceedings 17thInternational Conference on Data Engineering, pages 615–622.IEEE Comput. Soc, 2001.

[35] David G. Lowe. Distinctive Image Features from Scale-InvariantKeypoints. International Journal of Computer Vision, 60(2):91–110,November 2004.

[36] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li.Multi-probe LSH: efficient indexing for high-dimensional similaritysearch. In Proceedings of the 33rd international conference onVery large data bases, VLDB ’07, pages 950–961. VLDBEndowment, 2007.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 33 / 44

References

References XI

[37] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li.Multi-probe LSH: efficient indexing for high-dimensional similaritysearch. In Proceedings of the 33rd international conference onVery large data bases, VLDB ’07, pages 950–961. VLDBEndowment, 2007.

[38] G. Mainar-Ruiz and J. Perez-Cortes. Approximate NearestNeighbor Search using a Single Space-filling Curve and MultipleRepresentations of the Data Points. In 18th InternationalConference on Pattern Recognition (ICPR’06), pages 502–505.IEEE, 2006.

[39] Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower boundson locality sensitive hashing. In Proceedings of the twenty-secondannual symposium on Computational geometry - SCG ’06, page154, New York, New York, USA, 2006. ACM Press.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 34 / 44

References

References XII

[40] RT Ng. CLARANS: a method for clustering objects for spatial datamining. IEEE Transactions on Knowledge and Data Engineering,14(5):1003–1016, September 2002.

[41] David Novak and Michal Batko. Metric Index: An Efficient andScalable Solution for Similarity Search. In 2009 SecondInternational Workshop on Similarity Search and Applications,pages 65–73. IEEE, August 2009.

[42] David Novak, Martin Kyselak, and Pavel Zezula. Onlocality-sensitive indexing in generic metric spaces. Proceedings ofthe Third International Conference on SImilarity Search andAPplications - SISAP ’10, page 59, 2010.

[43] Alexander Ocsa and Elaine P M De Sousa. An Adaptive Multi-levelHashing Structure for Fast Approximate Similarity Search. Journalof Information and Data Management, 1(3):359–374, 2010.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 35 / 44

References

References XIII

[44] Rafail Ostrovsky, Yuval Rabani, Leonard Schulman, and ChaitanyaSwamy. The Effectiveness of Lloyd-Type Methods for the k-MeansProblem. In 2006 47th Annual IEEE Symposium on Foundations ofComputer Science (FOCS’06), volume 59, pages 165–176. IEEE,December 2006.

[45] Jia Pan and Dinesh Manocha. Fast GPU-based locality sensitivehashing for k-nearest neighbor computation. In 19th ACMSIGSPATIAL Int. Conf. on Advances in Geographic InformationSystems, GIS ’11. ACM, 2011.

[46] Rina Panigrahy. Entropy based nearest neighbor search in highdimensions. In Proceedings of the seventeenth annual ACM-SIAMsymposium on Discrete algorithm, SODA ’06, pages 1186–1195,New York, NY, USA, 2006. ACM.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 36 / 44

References

References XIV

[47] Rina Panigrahy. Entropy based nearest neighbor search in highdimensions. In Proceedings of the seventeenth annual ACM-SIAMsymposium on Discrete algorithm, SODA ’06, pages 1186–1195,New York, NY, USA, 2006. ACM.

[48] Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm forK-medoids clustering. Expert Systems with Applications,36(2):3336–3341, 2009.

[49] Adriano Arantes Paterlini, Mario A Nascimento, andCaetano Traina Junior. Using Pivots to Speed-Up k-MedoidsClustering. Journal of Information and Data Management,2(2):221–236, June 2011.

[50] Loïc Paulevé, Hervé Jégou, and Laurent Amsaleg. Localitysensitive hashing: A comparison of hash function types andquerying mechanisms. Pattern Recognition Letters,31(11):1348–1358, August 2010.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 37 / 44

References

References XV

[51] D. Pollard. Quantization and the method of k-means. IEEETransactions on Information Theory, 28(2):199–205, March 1982.

[52] Hanan Samet. Foundations of Multidimensional and Metric DataStructures (The Morgan Kaufmann Series in Computer Graphicsand Geometric Modeling). Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 2005.

[53] Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk.Nearest-Neighbor Methods in Learning and Vision: Theory andPractice (Neural Information Processing). The MIT Press, 2006.

[54] James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu,Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi, andAbdur Chowdhury, editors. Proceedings of the 17th ACMConference on Information and Knowledge Management, CIKM2008, Napa Valley, California, USA, October 26-30, 2008. ACM,2008.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 38 / 44

References

References XVI

[55] Tomáš Skopal. Where are you heading, metric access methods?:a provocative survey. In Proceedings of the Third InternationalConference on SImilarity Search and APplications, SISAP ’10,pages 13–21, New York, NY, USA, 2010. ACM.

[56] Malcolm Slaney, Yury Lifshits, and Junfeng He. OptimalParameters for Locality-Sensitive Hashing. Proceedings of theIEEE, 100(9):2604–2623, 2012.

[57] Raisa Socorro, Luisa Micó, and Jose Oncina. A fast pivot-basedindexing algorithm for metric spaces. Pattern Recognition Letters,32(11):1511–1516, August 2011.

[58] Aleksandar Stupar, Sebastian Michel, and Ralf Schenkel.RankReduce - processing K-Nearest Neighbor queries on top ofMapReduce. In In LSDS-IR, 2010.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 39 / 44

References

References XVII

[59] Eric Sadit Tellez and Edgar Chavez. On locality sensitive hashingin metric spaces. In Proceedings of the Third InternationalConference on SImilarity Search and APplications, SISAP ’10,pages 67–74, New York, NY, USA, 2010. ACM.

[60] George Teodoro, Daniel Fireman, Dorgival Guedes, Wagner MeiraJr., and Renato Ferreira. Achieving multi-level parallelism in thefilter-labeled stream programming model. Parallel Processing,International Conference on, 0:287–294, 2008.

[61] George Teodoro, Eduardo Valle, Nathan Mariano, Ricardo Torres,and Wagner Meira, Jr. Adaptive parallel approximate similaritysearch for responsive multimedia retrieval. In Proc. of the 20thACM international conference on Information and knowledgemanagement, CIKM ’11. ACM, 2011.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 40 / 44

References

References XVIII

[62] A.J.M. Traina, A. Traina, C. Faloutsos, and B. Seeger. Fastindexing and visualization of metric data sets using slim-trees.Knowledge and Data Engineering, IEEE Transactions on,14(2):244–260, 2002.

[63] Caetano Traina, Jr., Agma J. M. Traina, Bernhard Seeger, andChristos Faloutsos. Slim-trees: High performance metric treesminimizing overlap between nodes. In Proceedings of the 7thInternational Conference on Extending Database Technology:Advances in Database Technology, EDBT ’00, pages 51–65,London, UK, UK, 2000. Springer-Verlag.

[64] Jeffrey K. Uhlmann. Satisfying general proximity / similarity querieswith metric trees. Information Processing Letters, 40(4):175 – 179,1991.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 41 / 44

References

References XIX

[65] Eduardo Valle and Matthieu Cord. Advanced Techniques in CBIR:Local Descriptors, Visual Dictionaries and Bags of Features. In2009 Tutorials of the XXII Brazilian Symposium on ComputerGraphics and Image Processing, pages 72–78. IEEE, October2009.

[66] Eduardo Valle, Matthieu Cord, and Sylvie Philipp-Foliguet.High-dimensional descriptor indexing for large multimediadatabases. In Shanahan et al. [54], pages 739–748.

[67] Hongbo Xu. An Approximate Nearest Neighbor Query AlgorithmBased on Hilbert Curve. In 2011 International Conference onInternet Computing and Information Services, pages 514–517.IEEE, September 2011.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 42 / 44

References

References XX

[68] Peter N. Yianilos. Data structures and algorithms for nearestneighbor search in general metric spaces. In Proceedings of thefourth annual ACM-SIAM Symposium on Discrete algorithms,SODA ’93, pages 311–321, Philadelphia, PA, USA, 1993. Societyfor Industrial and Applied Mathematics.

[69] Pavel Zezula. Future trends in similarity searching. In Proceedingsof the 5th international conference on Similarity Search andApplications, SISAP’12, pages 8–24, Berlin, Heidelberg, 2012.Springer-Verlag.

[70] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and MichalBatko. Similarity Search - The Metric Space Approach, volume 32of Advances in Database Systems. Kluwer Academic Publishers,Boston, 2006.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 43 / 44

References

References XXI

[71] Pavel Zezula, Pasquale Savino, Giuseppe Amato, and FaustoRabitti. Approximate similarity retrieval with m-trees. The VLDBJournal, 7(4):275–293, December 1998.

[72] Qiaoping Zhang and Isabelle Couloigner. A new and efficientk-medoid algorithm for spatial clustering. In Osvaldo Gervasi,MarinaL. Gavrilova, Vipin Kumar, Antonio Laganà, HeowPueh Lee,Youngsong Mun, David Taniar, and ChihJengKenneth Tan, editors,Computational Science and Its Applications – ICCSA 2005, volume3482 of Lecture Notes in Computer Science, pages 181–189.Springer Berlin Heidelberg, 2005.

E.S. Silva () Metric LSH Wednesday 8th October, 2014 44 / 44