Continuously Adaptive Similarity Search - DSpace@MIT
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Continuously Adaptive Similarity Search - DSpace@MIT
Continuously Adaptive Similarity Search
The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
Citation Zhang, Huayi et al. "Continuously Adaptive Similarity Search."Proceedings of the 2020 ACM SIGMOD International Conference onManagement of Data, May 2020, Portland, Oregon, Association forComputing Machinery, May 2020.
As Published http://dx.doi.org/10.1145/3318464.3380601
Publisher Association for Computing Machinery (ACM)
Version Author's final manuscript
Citable link https://hdl.handle.net/1721.1/130067
Terms of Use Creative Commons Attribution-Noncommercial-Share Alike
Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/
Continuously Adaptive Similarity SearchHuayi Zhang
β
Worcester Polytechnic Institute
Lei Caoβ
MIT CSAIL
Yizhou Yan
Worcester Polytechnic Institute
Samuel Madden
MIT CSAIL
Elke A. Rundensteiner
Worcester Polytechnic Institute
ABSTRACTSimilarity search is the basis for many data analytics tech-
niques, including k-nearest neighbor classification and out-
lier detection. Similarity search over large data sets relies
on i) a distance metric learned from input examples and ii)
an index to speed up search based on the learned distance
metric. In interactive systems, input to guide the learning
of the distance metric may be provided over time. As this
new input changes the learned distance metric, a naive ap-
proach would adopt the costly process of re-indexing all
items after each metric change. In this paper, we propose
the first solution, called OASIS, to instantaneously adapt the
index to conform to a changing distance metric without this
prohibitive re-indexing process. To achieve this, we prove
that locality-sensitive hashing (LSH) provides an invarianceproperty, meaning that an LSH index built on the original
distance metric is equally effective at supporting similarity
search using an updated distance metric as long as the trans-
form matrix learned for the new distance metric satisfies
certain properties. This observation allows OASIS to avoid
recomputing the index from scratch in most cases. Further,
for the rare cases when an adaption of the LSH index is
shown to be necessary, we design an efficient incremental
LSH update strategy that re-hashes only a small subset of the
items in the index. In addition, we develop an efficient dis-
tance metric learning strategy that incrementally learns the
new metric as inputs are received. Our experimental study
using real world public datasets confirms the effectiveness
of OASIS at improving the accuracy of various similarity
search-based data analytics tasks by instantaneously adapt-
ing the distance metric and its associated index in tandem,
while achieving an up to 3 orders of magnitude speedup over
the state-of-art techniques.
1 INTRODUCTIONSimilarity search is important in many data analytics tech-
niques including i) outlier detection such as distance or
density-based models [4, 6, 25, 37]; ii) classification such as
βEqual Contribution
k-Nearest Neighbor (πNN) classification [11]; and iii) classi-
cal clustering methods such as density-based clustering [16]
and micro-clustering [1]. These techniques all rely on simi-
larity search to detect outliers that are far away from their
neighbors, to classify testing objects into a class based on
the classes of their k-nearest neighbors, or to identify objects
that are similar to each other.
The effectiveness and efficiency of similarity search relies
on two important yet dependent factors, namely an appropri-
ate distance metric and an effective indexing method. In simi-
larity search, the distance metric must capture correctly the
similarity between objects as per the domain. Pairs of objects
measured as similar by the distance metric should indeed
be semantically close in the application domain. Indexing
ensures the efficiency of the similarity search in retrieving
objects similar according to the distance metric. Clearly, the
construction of the index relies on the distance metric to
determine the similarity among the objects.
Unfortunately, general-purpose metrics such as Euclidean
distance often do not meet the requirements of applica-
tions [19]. To overcome this limitation, distance metric learn-
ing [46β49] has been introduced to learn a customized dis-
tance metric guided by which pairs of objects the application
considers similar or dissimilar, using so called similarity ex-amples.As a motivating example, in a collaboration with a large
hospital in the US (name removed due to anonymity), we
have been developing a large scale interactive labelling sys-tem to label EEG segments (450 million segments, 30TB)
with 6 classes representing different types of seizures. These
labelled EEG segments are used to train a classifier which
thereafter can automatically detect seizures based on EEG
signals collected during the clinical observation of patients.
To reduce the manual labelling effort by the neurologists, our
system supports similarity search requests submitted by the
neurologists to return the k nearest neighbors of the to-be-
labelled segment. The goal is to propagate the labels provided
by the experts to similar segments. However, when using Eu-
clidean distance in these nearest neighbor queries, segments
may not agree with the returned k nearest neighbors on class
labels. To solve this problem, our system employs distance
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
Figure 1: Example of distance metric learning
metric learning to construct a custom distance metric from
the similarity examples given by the neurologists. This en-
sures that the nearest neighbors of each segment are more
likely to correspond to segments that share the same labels
as depicted in Fig. 1. This then empowers us to automatically
label EEG segments with higher accuracy.
It has been shown that the effectiveness of distance met-
ric learning depends on the number of similarity examples
that are available in the application [5, 19]. Unfortunately, a
sufficient number of similarity examples may not be avail-
able beforehand to learn an accurate distance metric. In-
stead, similarity examples may be discovered over time as
the users interact with their data. For example, in our above
labelling system, whenever the neurologists find a segment
that should not be labeled with the same label as its nearest
neighbors, the distance metric has to be updated immediately
to avoid this mistake in the next search.
As another example, in a financial fraud detection appli-
cation, analysts may discover new fraudulent transactions
not known before. To make sure that the fraud detector can
accurately detect the latest threats, its distance metric must
be updated according to the new similarity examples, so that
the new types of fraud can be separable (distant) from nor-
mal objects. Therefore, the distance metric used in similarity
search has to be immediately learned and updated as soon
as new similarity examples become available.
Challenges & State-of-the-Art. While the continuous
learning of a distance metric to be used in similarity search
has been studied, it has been overlooked in the literature that
a change in distance metric renders the index constructed
using the old distance metric ineffective. This is because
the objects co-located by the index may no longer be close
under the new distance metric. Therefore, the index must
be re-built whenever the distance metric changes to assure
the users always work with the most up-to-date distance
semantics. Note that unlike index updates triggered by the
insertion of new objects, a change in the distance metric may
influence all existing objects. However, re-building an index
from scratch is known to be an expensive process, often tak-
ing hours or days [21]. Although this cost may be acceptable
in traditional databases where an index is typically built only
once upfront, in our context where the distance metric may
change repeatedly this is not viable. In particular, in interac-
tive systems such as our labelling system, hours of quiescent
time due to rebuilding index often is not acceptable to the
neurologists, because the time of the neurologists is precious.
However, without the support of an index, similarity search
is neither scalable nor efficient. In other words, in similar-
ity search the need to quickly adapt to the latest semantics
conflicts with the need to be efficient.Further, like indexing, distance metric learning itself is
an expensive process [47, 48]. Repeatedly learning a new
distance metric from scratch every time when a similar-
ity example becomes available would introduce a tremen-
dous computational overhead especially when new exam-
ples arrive frequently. Although some online distance met-
ric learning methods have been proposed [9, 24, 41], these
techniques [24] either tend to be not efficient due to the
repeated computation of the all-pair distances among all ob-
jects involved in all similarity examples received so far, or
others [9, 41] directly adopt the update method used in solv-
ing the general online learning problem [13]. This can lead
to an invalid distance metric producing negative distance
values [24].
Our ProposedApproach. In this work. we propose the firstcontinuously adaptive similarity search strategy, or OASIS
for short. It uses an incremental distance metric learning
method that together with an efficient index update strat-
egy satisfies the conflicting requirements of adaptivity and
efficiency in similarity search.
β’ OASIS models the problem of incremental distance met-
ric learning as an optimization problem of minimizing a
regularized loss [13, 41]. This enables OASIS to learn a new
distancemetric based on only the current distance metric and
the expected distances between the objects involved in the
newly provided similarity example. Therefore, it efficiently
updates the distance metric with complexity quadratic in the
dimensionality of the data β independent of the total num-
ber of objects involved in all similarity examples (training
objects).
β’ We observe that the popular Locality Sensitive Hashing
(LSH) [17, 23, 31, 32], widely used in online applications to
deal with big data, is robust against changes to the distance
metric, such that in many cases it is not necessary to update
the LSH index even when the distance metric changes. Dis-
tance metric learning transforms the original input feature
space into a new space using a transform matrix learned
from the similarity examples. We formally prove in Sec. 5
that hashing the objects in the original input space is equal
to hashing the objects in the new space using a family of new
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
hash functions converted from the original hash functions.
Therefore, if the new hash functions remain locality-sensitive
β i.e., similar objects are mapped to the same βbucketsβ with
high probability, then the original LSH index will continue
to remain effective in supporting similarity search also with
the new distance metric. We show that this holds as long as
the transform matrix learned by distance metric learning sat-
isfies certain statistical properties. Leveraging this invarianceobservation, OASIS avoids the unnecessary recomputation
of LSH hashes.
β’ Further, we design an efficient method to adapt the LSH
index when an update is found to be necessary. It is built on
our finding that the identification of the objects that most
likely would move to different buckets after the distance
metric changes can be transformed into the problem of find-
ing the objects that have a large inner product with some
query objects. Therefore, the objects that have to be updated
in the index can be quickly discovered by submitting a few
queries to a compact inner product-based LSH index. This
way, OASIS effectively avoids re-hashing the large majority
of objects during each actual LSH update.
Our experimental study using public datasets and large
synthetic datasets confirms that OASIS is up to 3 orders of
magnitude faster than approaches that use traditional online
metric learning methods [24] and update the index whenever
the distance metric changes. Moreover, OASIS is shown to
significantly improve the accuracy of similarity search-based
data analytics tasks from outlier detection, classification,
labeling, and nearest neighbor-based information retrieval
compared to approaches that do not update the distance
metric or use traditional online metric learning methods.
2 PRELIMINARIES2.1 Distance Metric LearningThe goal of distance metric learning is to learn a distance
function customized to the specific semantics of similarity
between objects of the application. Typical distance met-
ric learning techniques [46β49] achieve this by learning a
Mahalanobis distance metric π·π , defined by:
π·π (ββπ₯π ,ββπ₯ π ) = (ββπ₯π β ββπ₯ π )ππ (ββπ₯π β ββπ₯ π ) (1)
Hereββπ₯π and ββπ₯ π denote two π-dimensional objects, while
π represents a π Γπ matrix, called the transform matrix. TheMahalanobis metric can be viewed as: (1) applying first a
linear transformation from the d-dimensional input space
π ππ to a new d-dimensional input space π ππ represented by
L : Rdo β Rdn ; followed by (2) computing the Euclidean dis-
tance in the new input space as in Equation 2:
π·π (ββπ₯π ,ββπ₯ π ) = πΏ(ββπ₯π β ββπ₯ π )
(2)
where LTL = M .
π must be positive semidefinite, i.e., have no negative
eigenvalues [19], to assure that the distance function will re-
turn positive distance values. Distance metric learning learns
the transform matrixπ based on the user inputs regarding
which items are or are not considered to be similar, so called
similarity examples. The learned transform matrix aims to
ensure that all user-provided similarity examples are satisfiedin the transformed space.
The particular format of the similarity examples may differ
from application to application. In πNN classification [11]
and outlier detection [25], users may explicitly tell us which
class some examples belong to. πΎNN classification assigns
an object to the class most common among its k nearest
neighbors discovered from the labeled training examples.
Therefore, to improve the classification accuracy, given an
exampleββπ₯π , distance metric learning aims to learn a distance
metric that makesββπ₯π to share the same class with its k nearest
neighbor examples as shown in Fig. 1. This is also what is
leveraged by the neurologists who specify πNN queries in
our motivating example about the online labeling system.
In distance-based outlier detection [25], outliers are defined
as the objects that have fewer than π neighbors within a
distance range π . Distance metric learning therefore aims to
learn a transform matrix such that in the transformed space
the distance between an outlier example and its πth nearest
neighbor is larger than π . In clustering, users may tell us that
two example objects (or examples) either must be or cannot
be in the same cluster.
In the literature, different distance metric learning meth-
ods have been proposed for various applications, such as
Large Margin Nearest Neighbor (LMNN) [47, 48] for πNN
classification,MELODY [51] for outlier detection, andDML [49]
for clustering.
2.2 Locality Sensitive HashingSimilarity search on large data relies on indexing, because
many search operations, including π nearest neighbor (πNN)
search, are expensive. In online applications or interactive
systems which have stringent response time requirements,
indexing that produces approximate results tends to be suf-
ficient as long as it comes with accuracy guarantee; in par-
ticular, locality sensitive hashing (LSH) [17, 23, 31, 32], is a
popular method because it is scalable to big data, effective in
high dimensional space, and amenable to dynamic data, etc.
LSH hashes input objects so that similar objects map to
the same βbucketsβ with high probability. The number of
buckets is kept much smaller than the universe of possible
input items. To achieve this goal, LSH uses a family of hash
functions that make sure the objects close to each other
have a higher probability to collide than the objects far from
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
each other, so-called locality-preserving hashing. Here we
introduce a popular type of hash function used in LSH:
π (ββπ₯ ) =βββπ Β· ββπ₯ + π
π
β(3)
whereββπ is a random vector whose entries are drawn
from a normal distribution. This ensures π (ββπ₯ ) is locality-preserving [12].π is a random constant from the range [0,π ),withπ a constant that denotes the number of the buckets.
Next, we describe the process of building a classical (K,N)-parameterized LSH index:
β’ Define an LSH family F of locality-preserving functions
π : π π βS which map objects from the d-dimensional input
space π π to a bucket π β S.β’ Given the input parameter πΎ and π , define a new LSH
family G containing π meta hash functions ππ β G, whereeach meta function ππ is constructed from πΎ random hash
functions π1, π2, ..., ππ from F . Each ππ corresponds to one
hash table.
β’Hash objects: given an input objectββπ₯ ,ππ (ββπ₯ ) = (π1 (ββπ₯ ), π2 (ββπ₯ ),..., ππΎ (ββπ₯ )) (1 β€ n β€ N ).
The π hash tables produced by the π meta hash functions
correspond to the index structure of the given input data.
The procedure of querying the nearest neighbors of an
objectββπ from an LSH index is:
β’ Given an objectββπ , for each hash function ππ β G, calcu-
late ππ (π), and return the objectsββπ₯ , where ππ (ββπ₯ ) = ππ (ββπ ),
that is, the objectsββπ₯ that fall into the same bucket with
ββπ .β’ Union the results returned from all hash functions ππ
(n: 1 ... N) as candidates into one set, and find the nearest
neighbors ofββπ from this candidate set.
In addition to πNN queries, LSH is commonly used to
speed up data analytics techniques [40] for which similar-
ity search plays a critical role. Examples include distance or
density-based outliers [4, 6, 25, 37], density-based or spec-
tral clustering [16, 30]. In these applications, exact results
of similarity search that would require significantly more
retrieval time are not necessarily better than approximate
results with an accuracy guarantee as produced by LSH.
2.3 Problem StatementWe target the scenario where new similarity examples (e.g.,
user labels) continuously become available, requiring an
update to the similarity metric and the associated index.
Definition 2.1. ContinuouslyAdaptive Similarity Search.Given the transform matrixππ‘ used in the current distance
metric π·ππ‘and the current LSH index π»π‘ produced by LSH
family Fπ‘ w.r.t. distance metric π·ππ‘, after receiving a new
similarity example πΈ at time step π‘ , our goal is that:
ConvertExampleToConstraints
Application
New Similarity Example
Distance Metric
Continuous Metric Learning
DistanceMetricUpdate
LSH Update
LSHUpdateStrategy
updateOrNot
YesDistance Constraints
Online Analytics Request
UpdatedLSHIndex
Figure 2: The OASIS System
(1) The transform matrixππ‘ be updated toππ‘+1 such that
the new similarity example πΈ is satisfied by the new distance
metric π·ππ‘+1 usingππ‘+1; and(2) The LSH indexπ»π‘ be updated toπ»π‘+1 so to be equivalent
to re-hashing the data using a new LSH family F β²which is
locality-preserving w.r.t. the new distance metric π·ππ‘+1 .
3 THE OVERALL OASIS APPROACHAs shown in Fig. 2, given a similarity example supplied by a
particular application, the continuous metric learning com-
ponent of OASIS first converts it into one or multiple dis-
tance constraints in a standard format (Sec. 4.1). Then, our
incremental learning strategy updates the distance metric to
satisfy these constraints (Sec. 4.2). Taking the new distance
metric as input, the LSH update component determines if the
LSH index has to be updated based on our invariance obser-
vation established in Lemma 5.2 (Sec. 5). If necessary, OASIS
invokes our two-level LSH-based update strategy (Sec. 6) to
update the LSH index according to the new distance metric.
Applications. OASIS can be used to support a broad
range of applications, including but not limited to πNN
classification [11] used in text categorization and credit rat-
ing [19, 22], distance-based outlier detection [7, 8] to detect
network intrusion or financial frauds [15], and information
retrieval supported by πNN queries to retrieve documents,
images, or products that are most relevant to the given object.
LSH indexes are often used in these applications to speed up
the neighbor search process as discussed in Sec. 2.2.
In these applications, the new similarity examples might
correspond to the objects mis-classified by the πNN classifier,
the outliers that are captured recently and believed to be
different from the existing outliers, or a retrieved object not
relevant to the query object. OASIS then adapts the existing
distance metric to avoid these newly observed classification
errors, capture the new type of outliers, or ensure that only
relevant objects are returned. Next, the LSH index will be
updated to effectively support the new queries subsequently
submitted by the users or to sequentially process new data,
as shown in Fig. 2.
4 CONTINUOUS DISTANCE METRICLEARNING
In this section, we show how different types of similarity ex-
amples in various similarity search-based applications can be
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
encoded into the distance constraints in a quadruple format
(Sec. 4.1). In Sec. 4.2, we introduce an incremental distance
metric learning strategy using the distance constraints asinput. This general method serves diverse applications.
4.1 Modeling the Similarity ExamplesIn OASIS, a similarity example is converted into one
or multiple distance constraints in quadruple format
C = (ββπ’ ,ββπ£ , dt, b) that place constraints on the distance be-
tween two objects. More specifically, in πΆ , ππ‘ corresponds
to the target distance between objectsββπ’ and
ββπ£ . When the
binary flag π is -1, the distance between examplesββπ’ and
ββπ£cannot be larger than ππ‘ . When b is 1, the distance between
ββπ’ andββπ£ should be larger than ππ‘ .
Next, we show how to convert different types of similarity
examples in different analytics tasks into distance constraints.
To provide some representative examples, we illustrate the
conversion principles for kNN classifiers, πNN queries, and
distance-based outliers.
4.1.1 Distance Constraints in kNN Classifier
In a πNN classifier, given a new exampleββπ , the new con-
straints can be formed based onββπ βs nearest neighbors. The
intuition is that the nearest neighbors ofββπ should be the
objects that belong to the same class asββπ . This will improve
the classification accuracy of the πNN classifier.
First, we define the concept of target neighbor.
Definition 4.1. Given an objectββπ , its target neighbors
denoted as Nt (ββπ ) correspond to the k nearest neighbors,
determined by the current learned distance metric π·π , that
share the same labelπ¦π withββπ in the labeled example dataset.
The distance constraints are formed based on the nearest
neighbor ofββπ and its target neighbors as shown in Def. 4.2.
Definition 4.2. Given the new labeled objectββπ and its
nearest neighbor with a class label that is distinct from
ββπ βs label, denoted as N n (ββπ ), β target neighbors ofββπ de-
noted as N ti (ββπ ) β Nt (ββπ ), a new constraint πΆπ is formed
as: Ci = (ββπ ,N ti (ββπ ),DM (ββπ ,N n (ββπ )),β1) holds, where π·π
denotes the distance computed using the current learned
distance metric.
In Lemma 4.3, we show that the new distance metric
learned from these constraints is able to improve the classifi-
cation accuracy of the πNN classifier.
Lemma 4.3. Given an object ββπ previously mis-classified bythe πNN classifier, ββπ will no longer be mis-classified if thenew distance metric parameterized by transform matrixππ‘+1satisfies the new constraints C = {πΆ1,πΆ2, ...,πΆπ } defined byDef. 4.2.
Proof. By the new constraints C, the distance betweenthe new mis-classified object
ββπ (ββπ’π‘ ) and any of its target
neighbors N ti (ββπ ) (ββπ£π‘ ) has to be smaller than the distance
between N ti (ββπ ) and its nearest neighbor with non-matching
label N n (ββπ ) (ππ‘π‘ ), then the target neighbors ofββπ are guar-
anteed to be the k nearest neighbors ofββπ in the labeled
example set ππ‘ . By the definition of the πNN classifier,ββπ
belongs to the class of its target neighbors which share the
same class withββπ . Therefore, now ββπ is guaranteed to be
correctly classified. Lemma 4.3 is proven. β‘
Apply to π Nearest Neighbor (πNN) Queries. The aboveconstraint definition can be equally applied to support πNN
queries. Same as in the case of πNN classification, in a πNN
query, a small set of labeled examples are collected to learn
a distance metric that makes sure an example objectββπ is
in the same class with its nearest neighbors found in the
example set. The key difference between πNN queries and
πNN classification is that inπNNqueries the similarity search
is conducted on the large input dataset instead of the labeled
example set, although its distance metric is learned based on
the example set.
4.1.2 Distance Constraints in Distance-based OutliersAs described in Sec. 2.1, in the classical distance-based outlier
definition [25], outliers are defined as the objects that have
fewer than π neighbors within a distance range π . Similar
to πNN classification, OASIS forms the distance constraints
based on the k nearest neighbors of the new received example
ββπ as shown in Def. 4.4.
Definition 4.4. Given the new labeled objectββπ and its
k nearest neighbors denoted as KNN(ββπ ), β objectββππ β
KNN(ββπ ), a new constraint Ci = (ββπ ,ββππ , d, 1) is formed ifββπ
is an outlier. Otherwise, Ci = (ββπ ,ββππ , d,β1).
Using the distance metric learned from the constraints
formed by Def. 4.4, the distance-threshold outliers can be ef-
fectively detected in the newly transformed space. Intuitively,
given an objectββπ which is an inlier, but was mis-classified as
outlier by the outlier detector using the old distance metric
in the old space, it will no longer be mis-classified in the
new space. This is because in the new space the distances
betweenββπ and its π nearest neighbors are no larger than
π . Therefore, ββπ has at least π neighbors now. On the other
hand, ifββπ is an outlier, but was missed by the outlier detec-
tor in the old space, it tends to be captured in the new metric
space. This is because now the distance betweenββπ and its π
nearest neighbors are larger than π . Therefore, it is unlikely
thatββπ has k neighbors.
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
4.2 Incremental Learning StrategyThe incremental learning strategy in OASIS takes one dis-
tance constraints Ct = (ββπ’π‘ ,ββπ£π‘ , dtt, bt) as input. First, it com-
pute the distance dΜtt = DMt (ββπ’π‘ ,ββπ£π‘ ) based on the current dis-
tance metric using the transform matrixππ‘ . It incurs a loss
l(dΜtt, dtβt ), where dtβt = dtt (1 + btπΎ) corresponds to the tar-
get distance ππ‘π‘ weighted by πΎ β (0,1). If ππ‘ is -1, then the
weighted target distance ππ‘βπ‘ is smaller than the target dis-
tance ππ‘π‘ specified in πΆ . Otherwise, ππ‘βπ‘ is larger than ππ‘π‘ .
Therefore, l(dΜtt, dtβt ) measures the difference between the
current distance and the expected distance between the ex-
amples specified in πΆπ‘ .
Then OASIS updates the matrix fromππ‘ toππ‘+1 under theoptimization goal of minimizing the total loss of all distance
constraints received so far, i.e,
πΏππ π =βπ‘
π ( Λππ‘π‘ , ππ‘βπ‘ ) (4)
The goal is to ensure that the constraints seen previously
are still satisfied. OASIS uses a typical model for the loss func-
tion, that is, the squared loss [19]: l(dΜtt, dtβt ) = 12 (dΜtt β dtβt )2 ,
though others could be plugged in.
Leveraging the state-of-the-art online learning strat-
egy [13, 41], OASIS solves the problem of minimizing Eq. 4
by learning a Mt+1 that minimizes a regularized loss:
ππ‘+1 = argminπ·ππ (ππ‘+1, ππ‘ ) + [π (_
ππ‘, ππ‘βπ‘ ) (5)
In Eq. 5, π·ππ (ππ‘+1, ππ‘ ) is a regularization function given
byπ·ππ (ππ‘+1, ππ‘ ) = π‘π (ππ‘+1πβ1π‘ ) - logdet(ππ‘+1πβ1
π‘ ) -π . Here
the LogDet divergence [10] is used as the regularization func-
tion due to its success in metric learning [13].
_
ππ‘ = πΏππ‘ ππ‘+1πΏπ‘corresponds to the distance between
ββπ’π‘ and ββπ£π‘ computed us-
ing the new transformmatrixππ‘+1, where πΏt =ββπ’π‘ β ββπ£π‘ .[ > 0
denotes the regularization parameter, and π corresponds to
the data dimension.
By [38], Eq. 5 can be minimized if:
ππ‘+1 = ππ‘ β[ (
_
ππ‘ β ππ‘βπ‘ )ππ‘πΏπ‘πΏππ‘ ππ‘
1 + [ (_
ππ‘ β ππ‘βπ‘ )πΏππ‘ ππ‘πΏπ‘
(6)
The matrixππ‘+1 resulting from Eq. 6 is guaranteed to be pos-itive definite [24]. This satisfies the requirement of distance
metric learning.
Next, we show how to compute ππ‘+1 using Eq. 6. First,
we have to compute
_
ππ‘ . Since_
ππ‘ is a function of ππ‘+1, to
compute
_
ππ‘ , we have to eliminateππ‘+1 from its computation
first. By multiplying Eq. 6 by πΏππ‘ and πΏπ‘ on right and left, we
get: πΏTt Mt+1πΏt = πΏTt MtπΏt β[πΏTt (
_
dtβdtβt )MtπΏtπΏTt MtπΏt
1+[ (_
dtβdtβt )πΏTt MtπΏt.
Since
_
ππ‘ = πΏππ‘ ππ‘+1πΏπ‘ and dΜtt = πΏTt MtπΏt , we get:
_
ππ‘ =[ππ‘βπ‘
Λππ‘π‘ β 1 +β([ππ‘βπ‘ Λππ‘π‘ β 1)2 + 4[ Λππ‘
2
π‘
2[ Λππ‘π‘(7)
Using Eq. 7, we can directly compute
_
ππ‘ . Next, ππ‘+1 canbe computed by plugging Eq. 7 into Eq. 6.
Algorithm 1 Incremental Distance Metric Learning
1: function UpdateMetric(Current Metricππ‘ , Ct = (ββπ’π‘ ,ββπ£π‘ , dtt , bt ) , πΎ2:
Λππ‘π‘ = DMt (ββπ’π‘ ,ββπ£π‘ ) ;
3: ππ‘βπ‘ = dtt (1 + btπΎ ) ;4: Compute
_
ππ‘ using Eq. 7;
5: Computeππ‘+1 using Eq. 6;
Themain steps of the incremental distance metric learning
are shown in Alg. 1.
Time Complexity Analysis. The time complexity of this
method is determined by the product operation between the
d-dimensional vector and π Γ π matrix. It is quadratic in the
number of dimensions π , but does not rely on the number
of distance constraints received so far nor on the number of
example objects involved in these constraints.
5 LOCALITY SENSITIVE HASHING:INVARIANCE OBSERVATION
In this section, wewill establish the invariance observation forLSH, stating that in many cases it is not necessary to update
the LSH index at all when the distance metric changes. That
is, we will show that an LSH index built on the Euclidean
distance is sufficient to support similarity search under the
learned distance metric as long as the eigenvalues of its
transform matrixπ fall into a certain range.
For this, we first show that the original LSH index π» built
using a LSH family F can be considered as a LSH index π» β²
with respect to the learned distance metric π·π built using a
different family of hash functions F β².
Lemma 5.1. Given an LSH index H w.r.t. Euclidean distancebuilt using a family of locality-preserving hash functions:
π (ββπ’ ) =βββπ ββπ’ + π
π
β(8)
then π» can be considered to be an LSH index w.r.t the learneddistance metric π·π with transform matrix π = πΏππΏ, builtusing the following family of hash functions:
π β²(ββπ’ β²) =
βββπ β²ββπ’ β² + ππ
β(9)
where ββπ β² = ββπ πΏβ1 and ββπ’ β² = πΏββπ’ . 1
1Here we consider
ββπ β²and
ββπ as row vectors for ease of presentation.
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
Proof. Given a transform matrix π in the learned dis-
tance metric, we get a πΏ, whereπ = πΏππΏ. Replacing ββπ with
ββπ πΏβ1πΏ (ββπ =ββπ πΏβ1πΏ) from Eq. 8, we get: f (ββπ’ ) =
β ββπ Lβ1Lββπ’ +b
W
β.
Consideringββπ Lβ1 as a new hyperplane
ββπ β², we get
f (ββπ’ ) =β ββπ β²Lββπ’ +b
W
β. By the definition of the Mahalanobis dis-
tance (Sec. 2), Lββπ’ corresponds to the object in the new
space transformed fromββπ’ . Replacing Lββπ’ with
ββπ’ β², we get
f β²(ββπ’ β²) =β ββπ β²ββπ’ β²+b
W
β. It corresponds to a hash function in the
transformed space, whereββπ’ β²
= Lββπ’ is an input object and
ββπ β²=ββπ Lβ1 the new hyperplane. Lemma 5.1 is proven. β‘
Locality-Preserving. By the definition of LSH (Sec. 2.2), if
the hash functions are locality-preserving, then the LSH in-
dexπ» is effective in hashing the nearby objects into the same
buckets based on the learned distance metric π·π . Therefore,
if π β²() (Eq. 9) is locality-preserving, then the LSH index π»
built on the Euclidean distance is still effective in supporting
the similarity search under π·π .
Next, we show the entries in the hyperplaneββπ β²
of π β²()(Eq. 9) correspond to a normal distribution π (0, πΌ ) as longas the eigenvalues of the transform matrixπ satisfy certain
properties. By [12], this assures that the hash function π β²()using
ββπ β²as the hyperplane is locality-preserving.
Given a hash table produced by βπ hash functions, we use
matrix π΄ to represent the hyperplanes of all hash functions.
Therefore, π΄ is a βπ Γ π matrix, where π denotes the dimen-
sion of the data. Each row corresponds to the hyperplane
(hash vector)ββπ of one hash function.
Since a valid transform matrixπ is always symmetric and
positive definite as discussed in Sec. 2.1, it can be decomposed
into π = π πππ , where π ππ = πΌ and π is a diagonal matrix
formed by the eigenvalues of π . Since π = πΏππΏ, we get
πΏ = π1/2π .
π΄β²= π΄πΏβ1 = π΄π β1πβ1/2 (10)
We use [ββπ1,ββπ2, ...,ββππ ] to represent π΄π β1. ββππ represents theπth column of π΄π β1. We denote the elements at the diag-
onal of πβ1/2 as [π 1, π 2, ..., π π ]. We then have π΄π β1πβ1/2 =
[π 1ββπ1, π 2ββπ2, ..., π πββππ ].Next, we show each row vector in π΄π β1πβ1/2 which rep-
resents the hyperplaneββπ β²
of one hash function π β²() corre-sponds to a vector drawn from normal distribution π (0, πΌ ) aslong as the eigenvalues of the transformmatrixπ satisfy cer-
tain properties. By [12], this assures that the hash function
π β²() using ββπ β²as the hyperplane is locality-preserving.
Lemma 5.2. Statistically the hyperplane ββπ β² in Eq. 9 corre-sponds to a vector drawn from a normal distribution π (0, πΌ ) aslong asβ π π in themain diagonal of πβ1/2: 0.05 < π (π2
βπ< βππ
2
π )< 0.95, where p-value is set as 0.1.
Proof. We first show each row vector in π΄π β1 obeys
a normal distribution π (0, πΌ ). Since each row vector in π΄
obeys a normal distribution π (0, πΌ ), where πΌ is the identitymatrix, then each row vector in π΄π β1 obeys a multivariate
normal distribution π (0, Ξ£) [18], where Ξ£ = π β1 (π β1)π is
the covariance matrix. Since π ππ = πΌ , it is easy to see that
Ξ£ = π β1 (π β1)π = πΌ . Thus each row vector in π΄π β1 can be re-
garded as a vector drawn from a normal distribution π (0, πΌ ).Note this indicates that each element in π΄π β1 can be re-
garded as a sample drawn from a normal distribution π (0, 1).Therefore, each column vector
ββππ in π΄π β1 can also be re-
garded as a vector drawn from a normal distribution π (0, πΌ ).Next, we show each column vector π π
ββππ in π΄π β1πβ1/2 canalso be regarded as a vector drawn from a normal distribution.
This will establish our conclusion that each row vector in
π΄π β1πβ1/2 which represents the hyperplaneββπ β²
of one hash
function π β²() can be regarded as a vector drawn from normal
distribution [18].
We use π2 test to determine if βsiββππ β [s1ββπ1, s2ββπ2, ..., sdββππ ]statistically corresponds to a vector drawn from normal dis-
tribution π (0, πΌ ) [34]. Given a βπ dimension vector drawn
from normal distribution π (0, πΌ ), the sum of the squares of
its βπ elements obeys π2 distribution with βπ degrees of free-
dom [28]. We already know thatββππ is a vector drawn from a
normal distribution π (0, πΌ ). Therefore, as long as the sum of
the squares of the elements in π πββππ also obeys π2 distribution
with βπ degrees of freedom, statistically π πββππ corresponds to
a vector drawn from a normal distribution π (0, πΌ ) [34].First, we calculate the expected value of the sum of the
squares of the elements in π πββππ . Since ββππ corresponds to a
vector drawn from normal distribution π (0, πΌ ), πΈ (βπβπ=1
π2π π ) =
βπ , where ππ π denotes the πth element inββππ . Since π π is a
constant, we get πΈ (βπβπ=1
((π πππ π )2 = π 2π πΈ (βπβπ=1
(π2π π ) = βππ 2π .
We set the p-value as 0.1. By [34], if 0.05 < P (π2hn < hns2i )< 0.95, we cannot reject the hypothesis that π π
ββππ is a vec-
tor drawn from a normal distribution. This immediately
shows that each row vectorββπ β²
in π΄π β1πβ1/2 statistically
corresponds to a vector drawn from a normal distribution
π (0, πΌ ). β‘
Testing Method. Testing whether Lemma 5.2 holds is
straightforward. That is, we first compute the eigenvalues
πππ of the transform matrixπ . Since π is a diagonal matrix
formed by the eigenvalues ofπ , the element π π at the main
diagonal of πβ1/2 corresponds to ππβ1/2π
. We then use π2 ta-
ble [34] to test if 0.05 < P (π2n < hns2i ) < 0.95. This in fact cor-responds to 0.05 < P (π2n < hnegβ1i ) < 0.95. The eigenvaluecomputation and π2 test can be easily done using Python
libraries like Numpy and SciPy [35]. As an example, given
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
an LSH index with 3 hash functions, by [34], β eigenvalue
πππ ofπ , if 0.35 < 3 Γ egβ1i < 7 .81, that is 0.38 < egi < 8.57 ,then we do not have to update the LSH index.
Analysis. Now we analyze why Lemma 5.2 is often met.
By Eq. 5, Mt+1 = argminDld (Mt+1,Mt) + [l(dt+1, dβt ).l(dt+1, dβt ) represents the difference between the targeted
distance and the distance computed usingππ‘+1. Since many
solutions exist that can make l(dt+1, dβt ) close to 0 [38], thesolution of Eq. 5 is mainly determined by Dld (Mt+1,Mt).Initially, Mt = I when t = 0. Accordingly, π·ππ (ππ‘+1, ππ‘ )=
π·ππ (ππ‘+1, πΌ ) = π‘π (ππ‘+1) - logdet(ππ‘+1) - π . Note πππ‘ (ππ‘+1) =βdi=1 egi, π‘π (ππ‘+1) =
βππ=1 πππ , where πππ denotes the eigen-
value of ππ‘+1. By [45], π·ππ is minimized when ππ1 = ππ2 =
. . . = πππ =πβπππ‘ (ππ‘+1). Since in distance metric learning, π
usually is not small [47],πβπππ‘ (ππ‘+1) is close to 1 [45]. In a
similar manner, we can show thatππ‘+1 has similar eigenval-
ues toππ‘ when π‘ > 0. By induction, the eigenvalues ofππ‘+1tend to be close to 1, and thus tend to satisfy the requirement
in Lemma 5.2.
6 THE UPDATE OF LSHThe LSH index should be updated when the distance met-
ric changes, and Lemma 5.2 is not satisfied. To assure ef-
ficiency of similarity search, we now propose a fast LSH
update mechanism. It only re-hashes the objects that have
a high probability of being moved to a different bucket due
to an update of the distance metric. It includes two key in-
novations: (1) mapping the problem of finding the objects
that are strongly impacted by the update of the distance
metric to an inner product similarity search problem; and
(2) quickly locating such objects using an auxiliary inner
product similarity-based LSH index.
Measure the Influence of Distance Metric Change.Given a learned distance metric π·π with transform matrix
π = πΏππΏ, by Sec. 2.1, computing the distance using the
learned distance metric is equivalent to computing Euclidean
distance in an input space transformed by πΏ. Therefore, build-
ing an LSH index that effectively serves the similarity search
using the learned distance metricπ·π corresponds to hashing
the input objects in the transformed space.
Suppose we have two distance metrics w.r.t. transform
matricesππππ = πΏπππππΏπππ andππππ€ = πΏππππ€πΏπππ€ , then an LSH
index corresponding to each of these twometrics respectively
can be built using the following hash functions:
π (ββπ’ )πππ =
βββπ Β· πΏπππββπ’ + ππ
β, π (ββπ’ )πππ€ =
βββπ Β· πΏπππ€ββπ’ + ππ
βThen given an object
ββπ’ as input, the difference between
the outputs produced by the two hash functions can be com-
puted as:
π (ββπ’ )ππ π π =βββπ Β· (πΏππ π π ββπ’ )
π
β(11)
where Ldiff = Lold β Lnew represents the delta matrix be-
tween πΏπππ and πΏπππ€ . f (ββπ’ )diff represents how much the
change of the distance metric impactsββπ’ in the LSH index.
Obviously, it is not practical to find the objects that are
strongly impacted by the change of the distance metric by
computing π (ββπ’ )ππ π π for each input object, since it would be
as expensive as re-hashing all objects in the index.
An Inner Product Similarity Search Problem.As shownin Eq. 11, the value of π (ββπ’ )ππ π π is determined by
ββπ Β· (Ldiff ββπ’ ).Since
ββπ Β· (Ldiff ββπ’ ) = ββπ TLdiff Β· ββπ’ , the value of π (ββπ’ )ππ π π is
thus determined byββπ TLdiff Β· ββπ’ , that is, the inner product
betweenββπ TLdiff and
ββπ’ . The larger the inner product is, thelarger the π (ββπ’ )ππ π π is. Therefore, the problem of finding the
objectsββπ’ that should be re-hashed can be converted into the
problem of finding the objects that have a large inner productwith a query object ββπ =
ββπ TLdiff , where ββπ is constructed by
the hyperplaneββπ used in the hash function and the delta
matrix πΏππ π π .
By [44], finding the objectsββπ’ that have a large inner
product with the objectββπ corresponds to an inner product
similarity search problem. This problem can be efficiently
solved if we have an LSH index that effectively supports
inner product similarity search. Whenever the LSH has to
be updated, to find the objects that have to be re-hashed, we
thus only need to construct the query objectββπ and then
submit a similarity search query to the inner product-based
LSH index.
Next, we first present a data augmentation-based LSH
mechanism to efficiently support inner product similarity
search. We then introduce a multi-probe based strategy that
effectively leverages the output of the inner product similar-
ity search to find the objects that have to be updated.
6.1 Inner Product Similarity LSHThe key insight of the LSH mechanism is that by augment-
ing the query objectββπ and the input object
ββπ’ with one
extra dimension, the inner product similarity search can
be efficiently supported by leveraging any classical cosine
similarity-based LSH.
First, consider the cosine similarity defined in [43] as:
πππ (ββπ ,ββπ’ ) =ββπ Β· ββπ’ ββπ ββπ’ (12)
The inner product similarity [36, 42] can be expressed as:
πππππ (ββπ ,ββπ’ ) = ββπ Β· ββπ’ =
ββπ Β· ββπ’ ββπ ββπ’ Γ ββπ Γ ββπ’ (13)
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
By Eq. 12 and 13, we get:
πππππ (ββπ ,ββπ’ ) = πππ (ββπ ,ββπ’ ) Γ ββπ Γ ββπ’ (14)
In our OASIS, the inner product similarity search is used to
search for the neighbors of the query objectββπ . During one
similarity search, Eq. 14 is repeatedly used to compute the
distance betweenββπ and different input objects
ββπ’ . Therefore,for our similarity search purpose,
ββπ can be safely ignored
from Eq. 14, because it does not influence the relative affinity
from differentββπ’ to
ββπ .However, the search results are influenced by
ββπ’ . Evenif an object
ββπ’ has a high cosine similarity with the query
objectββπ , it does not necessarily mean that
ββπ’ has a high
inner product similarity withββπ because of varying
ββπ’ ofdifferent
ββπ’ .We now propose to fill this gap between the inner prod-
uct similarity and cosine similarity by appending one extra
dimension to the query objectββπ and the input object
ββπ’ as
shown below:
ββπ β² =[ββπ , 0] ,ββπ’ β² =
[ββπ’ ,
β1 β
ββπ’ 2]Next, we show that after this data augmentation, the inner
product similarity πππππ (ββπ ,ββπ’ ) can be computed based on
the cosine similarity between the augmentedββπ β²
andββπ’ β²
.
Lemma 6.1.
πππππ (ββπ ,ββπ’ ) = ββπ Β· ββπ’ = πππ (ββπ β²,ββπ’ β²) Γ ββπ (15)
Proof. Usingββπ β²
andββπ’ β²
in the computation of Cosine
similarity, we get cos(ββπ β²,ββπ’ β²) =
ββπ Β·ββπ’
β₯ββπ β₯β₯ββπ’ β²β₯ . Since ββπ’ β²
= [ββπ’ ,β1 β ββπ’ 2] = 1, we get cos(ββπ β²,ββπ’ β²) =
ββπ Β·ββπ’β₯ββπ β₯ . In turn,
we get πππππ (ββπ ,ββπ’ ) = ββπ Β· ββπ’ = πππ (ββπ β²,ββπ’ β²) Γ ββπ . β‘
By Lemma 6.1, if we build a cosine similarity LSH index for
the augmented objects, then the original objects that fall into
the same bucket will have a high inner product similarity.
Therefore, our data augmentation strategy ensures that
any cosine similarity LSH index can be effectively used to
support inner product similarity search. In this work, we
adopt the method introduced in [2].
6.2 Cosine Similarity LSH-based UpdateAccuracy VS Efficiency. OASIS re-hashes the objects thathave a large inner product with the query object
ββπ . Theseobjects are produced using a similarity query conducted on
the inner product LSH introduced above. Therefore, the cost
of adapting the LSH to fit the new distance metric depends
on the number of objects returned by the similarity query.
If too many objects were to be returned, then the update
would be expensive. On the other hand, returning too few
objects tends to miss the objects that should have changed
their buckets in the new LSH. Hence it risks to sacrifice its
accuracy in supporting similarity search when using the new
distance metric.
In this work, we tackle this delicate trade off by designing a
multi-probe LSH [31] based approach. This approach returns
as few objects as possible that could potentially change their
bucket membership in the new LSH, while still ensuring the
accuracy of the new LSH in supporting similarity search.
Multi-probe LSH. As shown in Fig. 3, compared to the
traditional LSH solution, a multi-probe LSH [31] divides
each large bucket intoπ small buckets. Then at query time,
instead of probing one single bucket π that the query object
falls into, it probes theπ small buckets. Multi-probe tends
to have a higher accuracy for the nearest neighbor search
task than the traditional LSH [31].
As an example, in Fig. 3, the traditional LSH returns π1and π3 as the π nearest neighbors of
ββπ (π = 2 in this case).
However, in fact, π2 is closer toββπ than π3. In this case, π2
is missed, because it falls into a nearby bucket. Multi-probe
LSH (in this caseπ = 3) instead correctly returns π1 and π2as nearest neighbors of
ββπ , because it also probes the two
neighboring buckets. As can be observed, the largerπ is, the
higher the accuracy of the nearest neighbor search becomes.
However, a largerπ also increases the search costs.
Figure 3: Multi-probe VS traditional LSH
Wrong(old) LocationTrue(new) Location
Query Instance
m
Figure 4: Avoidance of Additional Error in πNN Search
Avoidance of Additional Error in πNN Search. Theobservation here is that using multi-probe, even if we do not
re-hash an object that should be moved to a different bucket
from the old LSH to the new LSH w.r.t. the new learned
distance metric, often it does not impact the accuracy of
πNN search using the new distance metric, as shown below.
Given one input objectββπ’ , assume it is hashed into bucket
ππππ in the old LSH indexπ» , while it should have been hashed
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
into bucket ππππ€ in the new LSH π» β². Therefore,
ββπ’ moves
e =| bold β bnew | buckets due to the change of the distance
metric, where π can be computed using Eq. 11. For example,
in Fig. 4, objectββπ’ moves two buckets. Suppose
ββπ’ is a nearest
neighbor of queryββπ . ββπ should be hashed into bucket ππππ€
in π» β²per the property of LSH. Suppose
ββπ’ is not re-hashed
and thus remains in bucket ππππ even in the new LSH π» β². In
this case, if m β₯ 2 Γ e + 1, then the multi-probe strategy is
still able to findββπ’ as the nearest neighbor of
ββπ . As shownin Fig. 4, if we set π as 5 = 2 Γ 2 + 1, then ββπ’ will still be
correctly captured as the nearest neighbor ofββπ , because
multi-probe also considers the objects in the 2 nearby buckets
left to ππππ€ as candidates.
Search the To-be-updated objects. Based on this obser-
vation, we design a search strategy that by only examining
the objects in a small number of buckets, efficiently locates
the objects that could introduce additional error in the ap-
proximate πNN search if not re-hashed in the new LSH.
As shown in Alg. 2, given the delta matrix πΏππ π π = πΏπππ€ βπΏπππ and the parameterπ of multi-probe, we first construct
the query objectββπ as
ββπ ππΏππ π π (Line 2). Then we search for
the to-be-re-hashed objects starting from the bucket π of the
inner product LSH thatββπ falls into. For each object
ββπ’ in
bucket π, using Eq. 11 we compute the number of buckets
thatββπ’ moves denoted as π (Lines 7-9). ββπ’ will be re-hashed
if π is larger than πβ12
(Lines 10-11). Otherwise, additional
error may be introduced in the πNN search. If the π averaged
over all objects in bucket π, denoted as ππ£π(π), is larger thanπβ12, the buckets ππ nearby π are searched iteratively until
the ππ£π(π) of a bucket ππ is under πβ12
(Lines 13-17). These
objects are then re-hashed using f (ββπ’ )new =
β ββπ Β·Lnewββπ’ +b
W
β.
Algorithm 2 Locate to-be-updated Objects
1: function locateUpdate(Delta matrix πΏπππ π , numberOfProbe π, Hash
vectorββπ , Hash constπ )
2:ββπ =
ββπ π Β· πΏπππ π ;
3: list[] candidates = null; i = 1;
4: list[] buckets = buckets.add (H(q));
5: while (π β€ πβ12
) do6: eSum = 0; count = 0;
7: for each bucket b in buckets do8: for each object
ββπ’ in bucket b do
9: e =β ββπ Β· (Ldiff
ββπ’ )
W
β;
10: if π > πβ12
then11: candidates.add(e);
12: eSum +=e; count++;
13: avg (e) = eSumcount ;
14: if ππ£π (π) > πβ12
then15: buckets.addNeighbor(H(q),i); i++;
16: else17: return candidates;
7 EXPERIMENTAL EVALUATION7.1 Experimental Setup and MethodologiesExperimental Setup. All experiments are conducted on a
LINUX server with 12 CPU cores and 512 GB RAM. All code
is developed in Python 2.7.5.
Datasets. We evaluate our OASIS using both real datasets
and synthetic datasets.
Real Datasets.We use four real datasets publicly avail-able with various of dimensions and sizes, PageBlock [33],
KDD99 [39], CIFAR10 [26], and HEPMASS, to evaluate how
OASIS performs on different analytics tasks.
PageBLOCK contains information about different types
of blocks in document pages. There are 10 attributes in each
instance. PageBlock contains 5,473 instances, with 560 in-
stances that are not text blocks labeled as class 1 and the
other text instances labeled as class 2. PageBlock is used to
evaluate the performance of OASIS for πNN classification.
CIFAR10 [26] is a popular benchmark dataset for image
classification. It includes 10 classes and 60,000 training im-
ages. All images are of the size 32Γ32. Following the typical
practice [14], we first use unsupervised deep learning to ex-
tract a 4,096 dimensional feature vector to represent each
raw image. We then perform PCA to further reduce the di-
mension of the feature vectors to 100 dimensions. CIFAR10
is used in our labeling evaluation.
KDD99 corresponds to network intrusion detection data
curated as benchmark data set for outlier detection study.
There are 24 classes of intrusion connections and normal
connections. It contains in total 102,053 instances with 3,965
intrusion connections labeled as outliers. Each instance has
35 attributes plus the class label. Given KDD99 is an outlier
benchmark, we use it to evaluate the effectiveness of OASIS
in outlier detection.
HEPMASS was generated in high-energy physics experi-
ments for the task of searching for the signatures of exotic
particles. There are 10,500,000 instances in HEPMASS. Each
instance has 28 attributes. It is composed of two classes of
instances. 50% of the instances correspond to a signal process.
The other instances are a background process.
Synthetic Datasets. Since most public data sets with la-
bels are relatively small, we use synthetic datasets to evaluate
the scalability of OASIS, in particular the performance of our
LSH update strategy. We generate synthetic datasets using
scikit-learn 0.19.1. Each synthetic dataset has 50 attributes.
The largest dataset contains 100 million instances. The in-
stances are divided into 2 classes with 50 million instances
each. The data in each class is skewed and can be grouped
into multiple clusters.
Experimental Methodology & Metrics.We evaluate the
effectiveness and efficiency of our OASIS approach in four
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
(a) CPU time: varying πΎ (b) Accuracy: varying πΎ
Figure 5: KNN Classification: Accuracy and CPUTime (PageBLOCK)
(a) CPU time: varying πΎ (b) F1 Score: varying πΎ
Figure 6: Outlier Detection: F1 score and CPU Time(KDD99)
(a) CPU time: varying πΎ (b) Accuracy: varying πΎ
Figure 7: Labeling: Accuracy and CPU Time (CI-FAR10)
(a) CPU time: varying πΎ (b) Accuracy: varying πΎ
Figure 8: πΎNN Query: Accuracy and CPU Time (HEP-MASS)
different data analytics tasks, including labeling, πNN clas-
sification, outlier detection, and data exploration supported
by πNN queries.
Effectiveness. When evaluating the πNN classification,
we measure the accuracy of the prediction. When evaluat-
ing the outlier detection, we measure the F1 score. When
evaluating the labeling, we measure the accuracy of the auto-
matically produced labels. When evaluating the πNN query,
we measure the accuracy of the returned results.
We compare OASIS against three alternative approaches:
(1) baseline approach (No Update) that does not update thedistance metric; (2) static update approach (Static Update)that re-learns the transform matrix from scratch each time
when a new example is collected. It repeatedly applies the
most popular distance metric learning methods with respect
to different tasks [47, 49, 51]; It is expected to produce the
distance metric best fitting the similarity examples seen
over time; (3) an online metric learning method LEGO [24]
Efficiency. Efficiency is evaluated by the running time.
We measure the running time of our continuous metric learn-
ing method and our LSH update method separately.
Same to the effectiveness evaluation, we compare the effi-
ciency of our incremental metric learning method in OASIS
against (1) No Update, (2) Static Update, and (3) LEGO.
For evaluating our LSH update method in OASIS, we mea-
sure the total running time of similarity searches plus the
update of the LSH index. We compare against two alternative
approaches: (1) similarity search by linear scan without the
support of LSH (Linear Scan); and (2) similarity search with
the LSH rebuilt from scratch whenever the distance metric
changes (Rebuild LSH ). We use the large HEPMASS dataset
and synthetic datasets to evaluate the LSH update method
of OASIS. The other three real datasets are small and thus
not appropriate for this scale measurement experiment.
Evaluation of the Proposed Inner Product LSH. We
also evaluate our proposed inner product LSH by measuring
the recall and query time.
7.2 Evaluation of kNN ClassificationWhen supporting πNN classification, OASIS starts by using
Euclidean distance as the initial distance metric. If a test-
ing object is misclassified, it is used as a new example that
triggers the update process of OASIS, namely updating the
distance metric model and the LSH. Static Update re-learnsthe distance metric using LMNN [47] β the most popular
distance metric learning method used in πNN classifier.
Evaluation of Classification Accuracy. We evaluate
how OASIS improves the accuracy of πNN classification
using the PageBlock data set. Initially, the labeled training
set contains 1,000 instances. We vary the parameter πΎ , the
most important parameter in πNN classifier, from 3 to 10.
As shown in Fig. 5(b), OASIS outperforms No Update in all
cases because of the progressively enriched labeled dataset
and the continuous update of the distance metric model,
while No Update always uses the initially labeled dataset andthe fixed distance metric model which becomes stale over
time. Further, the accuracy of OASIS is very close to Static Up-date. Static Update re-learns the distance metric from scratch
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
every time when a new example is acquired, therefore its
performance is expected to be the best. This shows that the
distance metric model incrementally learned by OASIS is
close to the optimal model, while being much more efficient
than Static Update (see Fig. 5(a)). LEGO also incrementally up-
dates the distance metric model. However, it performs worse
than OASIS in all cases, although better than No Update.
This is as expected, because when forming the new distance
constraint for a new received similarity example, LEGO first
computes the pairwise distances among all similarity ex-
amples received so far and uses the 5th percentile distance
as the target distance. Using a target distance computed by
this hard-coded rule, given one particular similarity example,
its neighbors in the same class are not necessarily closer
than its non-matching neighbors, potentially introducing
classification errors in LEGO. In contrast, OASIS computes
a customized target distance for each example that ensures
this similarity example will not get mis-classified any more,
as described in Sec. 4.1.1.
Evaluation of Efficiency.As shown in Fig. 5, OASIS signifi-cantly outperforms Static Update up to 3 orders of magnitude
in updating the distance model. This is because the incremen-
tal update strategy has a time complexity that is quadratic
in the dimensionality of the data and is not dependent on
the number of examples acquired so far. While Batch Update
and Static Update both have to re-learn the distance metric
model from scratch. Apart from the time complexity which
is quadratic in the size of the labeled dataset and in number
of data dimensions, learning the distance metric model from
scratch also consumes a huge amount of memory. There-
fore, it tends to fail due to an out of memory error when the
number of examples is large. Our OASIS instead requires
near-zero extra memory besides the transform matrix.
OASIS is up to 100x faster than LEGO in updating the
distance metric. This is because LEGO has to be aware of all
pairwise distances among all examples seen so far in each
update, while OASIS only needs the πNN to the new example
to produce the distance constraints.
7.3 Evaluation of Outlier DetectionWe use the KDD99 dataset to evaluate the performance of
OASIS on supporting distance-based outliers. If an object is
mis-classified, we use the strategy discussed in Sec. 4.1.2 to
produce the distance constraints and then update the dis-
tance metric. For Static Update, we insert the newly acquired
outlier or inlier examples into the training set and re-learn
the distance metric using MELODY [51] β the most recent
distance metric learning-based outlier detection method. Ini-
tially the training set has 1000 examples. Similar to our OA-
SIS, LEGO also online updates the distance metric when an
object gets mis-classified, using the constraints produced by
its own strategy.
Evaluation of F1 Score.We evaluate how OASIS improves
the accuracy of outlier detection by continuously updating
the distance metric. The input parameter πΎ is varied in this
case. As shown in Fig. 6(b), OASIS outperforms Static Updatein F1 score by about 10%. Note since Static Update re-learnsthe distance metric from scratch each time when a new exam-
ple is acquired, we expected it to perform the best. However,
this is not the case. This indicates that the general incremen-
tal update strategy used in OASIS is very effective, sometimes
even better than the distance metric learning method cus-
tomized to the analytics tasks. The F1 scores of No Updateand LEGO are much lower. The reason that LEGO performs
poorly is that it uses the 5th percentile pairwise distance
as target distance. The learned distance metric using this
small target distance tends to make the outliers close to their
neighbors. This is not suitable for outlier detection, since
based on the general principle of outlier detection, outliers
correspond to the objects that have large distances to their
neighbors [15].
Evaluation of Efficiency. As shown in Fig. 6(a), OASIS is
up to 5 orders of magnitude faster than Static Update and 3
orders of magnitude faster than the online algorithm LEGO.
7.4 Evaluation of LabelingWe use the CIFAR10 dataset to evaluate the effectiveness
of OASIS at supporting labeling applications. We randomly
select 100 images as an initial labeled data set to learn a
distance metric using LMNN [47]. Then we randomly select
3,000 images as labeled query set and consider the rest of
CIFAR10 as an unlabeled set. We then propagate labels to
the unlabeled set using πNN queries following the OASIS
procedure described in Sec. 1.
Evaluation of Accuracy. The labeling accuracy is mea-
sured as the ratio of the number of correctly labeled objects
to all labeled objects by propagation. As shown in Sec. 7(b),
the accuracy of OASIS is comparable to the Static Update so-lution, and higher than No-Update and LEGO. Note when π
increases, the accuracy of system increases even further. This
property is particularly useful in labeling, because a larger π
indicates that more labels can be automatically produced.
Evaluation of Efficiency. As shown in Fig. 7(a), OASIS is
2 orders faster than Static Update and LEGO. The processingtimes of Static Update and LEGO are close to each other.
7.5 Evaluation of πNN QueriesThe results are shown in Fig. 8. Please refer to the extended
version [3] of this manuscript for the analysis.
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
(a) k=10 (b) k=50 (c) k=100 (d) Average
Figure 9: Recall metric for πΎNN query with LSH udpate: Varying K (HEPMASS dataset)
(a) k=10 (b) k=50 (c) k=100 (d) Average
Figure 10: Recall metric for πΎNN query with LSH update: Varying K (Synthetic dataset)
(a) HEPMASS dataset (b) Synthetic dataset
Figure 11: CPU time of LSH update with varyingdataset sizes
(a) Recall: Varying value of πΎ (b) CPU Time: Varying Dataset Size
Figure 12: Inner Product LSH: varying πΎ (HEPMASS)
7.6 Evaluation of LSH UpdatesWe use the largest real dataset HEPMASS and large synthetic
datasets to evaluate both the effectiveness and efficiency of
our LSH update method. For effectiveness, we measure the
recall of the exact πNN among the candidates returned from
LSH. For efficiency, we measure the overall processing time
including the time for updating LSH and for πNN search. 100
distance metric updates are triggered in each case. Each LSH
index uses 3 hash tables.
Effectiveness. The recall is measured for 100 queries. We
vary the k parameter (k = 10, 50 and 100) in the πNN queries.
We compare the LSH update method in OASIS against Re-build which re-hashes all objects whenever the distance met-
ric changes.
Fig. 9 and 10 show how the recall of our LSH update
method and Rebuild change over time during the update
process. The two curves w.r.t. LSH update and Rebuild look
similar. In addition, we also measure the average recall. As
shown in Figs. 9(d) and 10(d), the average recall of LSH update
is only slightly lower than Rebuild, confirming the effective-
ness of our LSH update strategy.
Efficiency. We compare our LSH update with Linear Scanwithout using indexing and Rebuild LSH. We measure the
total processing time of 1,000 πNN queries on each dataset.
Real HEPMASS dataset. We vary the sizes of the sub-
sets of HEPMASS used in the evaluation from 100k to 10M.
As shown in Fig. 11(a), our LSH update method consistently
outperforms Rebuild LSH by 3 to 4 orders of magnitude. This
is because our LSH update strategy avoids the update of the
LSH index in most cases. Further, in the rare case of an actual
update, only a small number of objects are re-hashed. Our
LSH update method also consistently outperforms LinearScan by 2 to 3 orders of magnitude, because the time com-
plexity of πNN search is reduced from linear to sub-linear for
LSH. Rebuild LSH is even slower than Linear Scan, although
it also uses the sub-linear time complexity approximate πNN
search. This is due to the costs of having to re-hash all objects
in each update. This update overhead outweighs the query
time saved by the approximate πNN search.
SIGMOD β20, June 14β19, 2020, Portland, OR, USA Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, and Elke A. Rundensteiner
Synthetic Datasets. The results on the synthetic datasetsare similar to those of the HEPMASS dataset as shown in
Fig. 11(b). When the dataset gets larger to 100M, our method
outperforms the alternative methods even more. This con-
firms that our method is scalable to large datasets.
7.7 Evaluation of Inner Product LSHWe compare our inner product LSH used in OASIS to quickly
find the to-be-rehashed objects against two state-of-the-art
inner product LSH methods ALSH [42] and Hyperplane [36]
using the real HEPMASS dataset.
Following the typical practice of evaluating approximate
similarity search [29], we measure both recall and query
processing time. For this, we configure all LSH methods to
have a similar response time by first tuning the parameters on
a small sample dataset (10k) and then evaluating the accuracy
on larger datasets. As confirmed in Fig. 12(b), after parameter
tuning, thesemethods indeed have a similar query timewhen
increasing the sizes of the datasets (with our OASIS slightly
faster).
As shown in Fig 12(a), the OASIS inner product LSH signif-
icantly outperforms the alternate methods ALSH and Hyper-
plane. When varying the parameter πΎ , the recall of OASIS
LSH is consistently higher than 0.85, while the recall of ALSH
and Hyperplane remain below 0.2. This is because our data
augmentation mechanism ensures that inner product similar-
ity search can be supported using the cosine similarity LSH,
while cosine similarity LSH is well studied in the literature
and achieves high recall [2].
8 RELATEDWORKOnline Learning. Some prior research [9, 24, 41] focused
on online metric learning. [24] solves for the optimization
function using the exact gradient. This solves the problem
that traditional online learning approaches could produce
non-definite matrices [13] because of the approximate so-
lution for the gradient of the loss function. However, [24]
cannot ensure that the previously mis-classified objects are
no longer mis-classified in the new transformed space, be-
cause it uses the 5th percentile of the pairwise distances
among all training examples to set the target distance. We
solve this problem by forming the new constraints on the
optimization function that take the distance relationship
between the matching neighbors of the new item and its
non-matching neighbors into consideration.
[9] proposed a method that incrementally learns a good
function to measure the similarity between images. [41] fo-
cuses on sparse data like image datasets similar to [9]. Both
[9] and [41] cannot guarantee the transition matrix π to
be positive semidefinite. Therefore, they cannot be directly
applied to distance metric learning.
Incremental LSH Update. In [24], a method is sketched
to support the update of cosine similarity-based LSH in a lin-
early transformed space. It repeatedly samples some objects
to re-hash whenever the transform changes and periodically
rebuild the LSH from scratch to ensure its effectiveness. In
this work, we prove that the LSH built in the original input
feature space is equally effective in supporting similarity
search in the transformed input feature space as long as
the transform matrix satisfies Lemma 5.2. Therefore, OASIS
avoids updating LSH in most cases.
LSH for Inner Product Similarity Search. Maximum
Inner product search (MIPS) attracts much attention recently
because of its application in Deep Learning. Accordingly, in-
ner Product similarity-based LSH methods were proposed to
speed up MIPS [20, 36, 42]. To support MISP, [42] extend the
LSH framework to allow asymmetric hashing. [36] instead
presents a simple symmetric LSH exists that has stronger
guarantees and better empirical performance than the asym-
metric LSH in supporting MIPS. However, the LSH in [36],
simply using the hyperplane hash function, is shown to have
a long running time β often as long as the full linear scan.
Active Distance Metric Learning. In [27, 50], active
learning-based approaches were proposed that learn a dis-
tancemetric by selectively sampling the to-be-labeled objects
from the existing dataset. Therefore, their focus is on reduc-
ing the labeling work from the users. However, unlike our
work, they do not account for the continuously incoming
objects that had previously not been seen. Therefore, once a
distance metric is learned, it will never be updated. We in-
stead focus on the continuous update of the distance metric
driven by dynamically acquiring similarity constraints.
9 CONCLUSIONIn this work, we propose OASIS that consistently improves
the accuracy and efficiency of similarity search-based data
mining tasks. OASIS features an incremental distance metric
learning strategy that adapts the distance metric whenever
new similarity example becomes available. OASIS also offers
an update mechanism that only updates the LSH index when
it is absolutely necessary and re-hashes a small subset of
objects when the update actually happens. Our experimental
study confirms the 3 orders of magnitude speedup of our
strategies over the state-of-the-art methods.
REFERENCES[1] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clus-
tering evolving data streams. VLDB, 2003.
[2] A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt.
Practical and optimal lsh for angular distance. In NIPS, pages 1225β1233, 2015.
[3] Anonymous. Continuously adaptive similarity search. https:
//drive.google.com/file/d/1hFsqVD6LlQPRm7SBydk1TRlzGuD1yvf2/
view?usp=sharing, 2019.
Continuously Adaptive Similarity Search SIGMOD β20, June 14β19, 2020, Portland, OR, USA
[4] S. D. Bay and M. Schwabacher. Mining distance-based outliers in near
linear time with randomization and a simple pruning rule. In KDD,pages 29β38, 2003.
[5] A. Bellet, A. Habrard, and M. Sebban. A survey on metric learning for
feature vectors and structured data. CoRR, abs/1306.6709, 2013.[6] M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander. LOF: identifying
density-based local outliers. In SIGMOD, pages 93β104, 2000.[7] L. Cao, J. Wang, and E. A. Rundensteiner. Sharing-aware outlier an-
alytics over high-volume data streams. In SIGMOD, pages 527β540,2016.
[8] L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. A. Rundensteiner. Scal-
able distance-based outlier detection over high-volume data streams.
In ICDE, pages 76β87, 2014.[9] G. Chechik, U. Shalit, V. Sharma, and S. Bengio. An online algorithm
for large scale image similarity learning. In NIPS, pages 306β314, 2009.[10] A. Cichocki, S. Cruces, and S. ichi Amari. Log-determinant diver-
gences revisited: Alpha-beta and gamma log-det divergences. Entropy,17:2988β3034, 2015.
[11] T. Cover and P. Hart. Nearest neighbor pattern classification. IEEETrans. Inf. Theor., 13(1):21β27.
[12] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive
hashing scheme based on p-stable distributions. In SCG, pages 253β262.ACM, 2004.
[13] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-
theoretic metric learning. ICML, pages 209β216, New York, NY, USA,
2007. ACM.
[14] A. Dosovitskiy, J. T. Springenberg, M. A. Riedmiller, and T. Brox. Dis-
criminative unsupervised feature learning with convolutional neural
networks. In NIPS, pages 766β774, 2014.[15] C. C. A. S. Edition. Outlier Analysis. Springer, 2017.[16] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm
for discovering clusters in large spatial databases with noise. KDDβ96,
pages 226β231, 1996.
[17] J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme
based on dynamic collision counting. SIGMOD, pages 541β552, 2012.
[18] N. Hansen. The cma evolution strategy: A tutorial. arXiv preprintarXiv:1604.00772, 2016.
[19] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. Springer, 2009.
[20] Q. Huang, G. Ma, J. Feng, Q. Fang, and A. K. Tung. Accurate and
fast asymmetric locality-sensitive hashing scheme for maximum inner
product search. In KDD, pages 1561β1570. ACM, 2018.
[21] S. Idreos, O. Papaemmanouil, and S. Chaudhuri. Overview of data
exploration techniques. In SIGMOD, pages 277β281, 2015.[22] S. B. Imandoust and M. Bolandraftar. Application of k-nearest neigh-
bor ( knn ) approach for predicting economic events : Theoretical
background. 2013.
[23] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards
removing the curse of dimensionality. STOC, pages 604β613. ACM,
1998.
[24] P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning
and fast similarity search. NIPS, pages 761β768, USA, 2008.
[25] E. M. Knorr and R. T. Ng. Algorithms for mining distance-based
outliers in large datasets. In VLDB, pages 392β403, 1998.[26] A. Krizhevsky and G. Hinton. Learning multiple layers of features
from tiny images. 2009.
[27] K. Kumaran, D. Papageorgiou, Y. Chang, M. Li, and M. TakΓ‘c. Active
metric learning for supervised classification. CoRR, abs/1803.10647,2018.
[28] H. O. Lancaster and E. Seneta. Chi-square distribution. Encyclopediaof biostatistics, 2, 2005.
[29] W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. Approx-
imate nearest neighbor search on high dimensional data - experiments,
analyses, and improvement. TKDE, 2019.[30] U. Luxburg. A tutorial on spectral clustering. Statistics and Computing,
17(4):395β416, Dec. 2007.
[31] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe
lsh: Efficient indexing for high-dimensional similarity search. PVLDB,
pages 950β961, 2007.
[32] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Intelligent
probing for locality sensitive hashing: Multi-probe lsh and beyond.
Proc. VLDB Endow., 10(12):2021β2024, Aug. 2017.[33] D. Malerba, F. Esposito, and G. Semeraro. A further comparison of
simplification methods for decision-tree induction. In Learning fromdata, pages 365β374. Springer, 1996.
[34] J. McDonald. Chi-square test of goodness-of-fit: Handbook of BiologicalStatistics. Sparky House Publishing, 2009.
[35] K. Millman and M. Aivazis. Python for scientists and engineers. Com-puting in Science & Engineering, 13(02):9β12, mar 2011.
[36] B. Neyshabur and N. Srebro. On symmetric and asymmetric lshs for
inner product search. ICML, pages 1926β1934. JMLR.org, 2015.
[37] S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI:
fast outlier detection using the local corr. integral. In ICDE, pages315β326, 2003.
[38] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Nu-merical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge
University Press, New York, NY, USA, 3 edition, 2007.
[39] X. Qin, L. Cao, E. A. Rundensteiner, and S. Madden. Scalable kernel
density estimation-based local outlier detection over large data streams.
2019.
[40] A. Rajaraman and J. D. Ullman. Mining of Massive Datasets. Cambridge
University Press, New York, NY, USA, 2011.
[41] S. Shalev-Shwartz, Y. Singer, and A. Y. Ng. Online and batch learning
of pseudo-metrics. In ICML, page 94, 2004.[42] A. Shrivastava and P. Li. Improved asymmetric locality sensitive
hashing (ALSH) for maximum inner product search (MIPS). pages
812β821, 2015.
[43] A. Singhal. Modern information retrieval: A brief overview. IEEE DataEng. Bull., 24(4):35β43, 2001.
[44] R. Spring and A. Shrivastava. Scalable and sustainable deep learning
via randomized hashing. In SIGKDD, pages 445β454, 2017.[45] J. M. Steele. The Cauchy-Schwarz master class: an introduction to the
art of mathematical inequalities. Cambridge University Press, 2004.
[46] F. Wang and J. Sun. Survey on distance metric learning and dimension-
ality reduction in data mining. Data Min. Knowl. Discov., 29(2):534β564,Mar. 2015.
[47] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for
large margin nearest neighbor classification. In NIPS, pages 1473β1480,2006.
[48] K. Q. Weinberger and L. K. Saul. Distance metric learning for large
margin nearest neighbor classification. J. Mach. Learn. Res., 10:207β244,June 2009.
[49] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric
learning, with application to clustering with side-information. In NIPS,NIPSβ02, pages 521β528, Cambridge, MA, USA, 2002. MIT Press.
[50] L. Yang, R. Jin, and R. Sukthankar. Bayesian active distance metric
learning. In UAI, pages 442β449, 2007.[51] G. Zheng, S. L. Brantley, T. Lauvaux, and Z. Li. Contextual spatial
outlier detection with metric learning. In SIGKDD, pages 2161β2170,2017.