Boosting the Power of Swift with Metadata Search - OpenStack

29
Boosting the Power of Swift with Metadata Search Presenters Dean Hildebrand Eran Rom Nilesh Bhosale Joint work with Paula Ta-Shma Guy Hadash 1

Transcript of Boosting the Power of Swift with Metadata Search - OpenStack

1

Boosting the Power of Swift with Metadata SearchPresentersDean HildebrandEran RomNilesh Bhosale

Joint work withPaula Ta-ShmaGuy Hadash

1

Agenda

▪ What is Object Metadata?

▪ What is Metadata Search?

▪ Use Cases

▪ Demo

▪ Implementation Details

▪ Future Work

2 2

What is Metadata?

▪ User-defined metadata▪ Unique feature of object storage compared to other storage systems

▪ Swift and S3 metadata are compatible through Swift3 middleware

▪ Metadata is the structured data about the unstructured object▪ Who, what, when, where, and why of account, container, object

▪ Perfect for indexing and searching

3 3

Metadata Examples

4

Age Biomarkers Developmental Stage Cell Surface Markers Cell Type/Cell LineDisease State Extract Molecule Genetic Characteristics Immunoprecipitation AntibodyOrganism Platform Sex Strain Time Point Tissue Type Treatment Compound

Biomedical

Astronomy & Astrophysics

Geospatial

Image

Music

4

What Swift Metadata Exists and How do I use it?

▪ User Metadata can be added/removed to Accounts/Containers/Objects

▪ E.g., X-Container-Meta-{name}, X-Remove-Container-Meta-{name}

▪ System metadata also exists, some can even be set by the user▪ E.g., Content-Type, Last-Modified

▪ Semantics▪ PUT and POST Metadata Semantics

Account/Container – New user metadata added to existing list of metadataObject – New user metadata overwrites all existing user metadata

▪ COPY retains existing metadata unless new metadata is specified▪ HEAD returns metadata only

5

What is Metadata Search?

6

▪ Automatically index and catalog Swift user and system metadata

▪ Provide REST-API for searching for objects based on their metadata

▪ Currently available in IBM SoftLayer Swift object storage service

6

Why is Metadata Search Valuable?

7

▪ Imagine Internet without Google

▪ Swiftly find needles in the OpenStack

▪ Help users and administrators perform Data Analytics

▪ Metadata can be on highest tier (SSD) while data resides on lower tier (Disk/Tape)

General Use Cases

▪ Data Mining

▪ Data Warehousing

▪ Selective data retrieval, data backup,

data archival, data migration

▪ Management/Reporting 7

8

City: RomeTime: Day

photo1.jpgCity: RomeTime: Night

photo2.jpgCity: HaifaTime: Day

photo3.jpg

GET /MyPhotoSpace?query=city=‘Rome’ AND Time=’Day’

GET /MyPhotoSpace?query=time=‘Night’

* Schematic, not complete syntax

Sample Use-CasesAdvanced Photo Album

8

photo4.jpgCity: TokyoTime: Night

Media use case - Complex Searches

Search Query

GET /MyPhotoSpace?query=tags ~ 'John' OR tags ~ 'Bob' OR tags ~ 'Alice' AND date > 2/12/2012 AND date < 3/12/2013 AND num_views > 10000

What we searched for?

▪ Date range search

▪ Free Text matching

▪ Integer comparison

9

Metadata Enrichment

Storlet

Object Store

Swift

Upload

EnrichedMetadata

Data

myvideo.mxf

Metadata

Data

myvideo.mxf

Data

Metadata Search with Enriched Metadata – Developed with RAI Italy

10

Finding objects by their metadata values

SwiftGet objects whose loudness

is faulty

Object Store

Metadata Search Facility

myvideo.mxf

Find faulty objects

11

Analyze IoT data efficiently and cost effectively

– Treat Swift as a long term store for semi-structured IoT data

– Store in Parquet format– Queryable via Apache Spark SQL– Optimized predicate pushdown

- Implemented a custom Spark SQL external data source driver

- Uses metadata indexes- Searches for Swift objects whose min/max

values overlap requested ranges

Get all data for morning traffic:SELECT codigo, intensidad, velocidad FROM madridtraffic WHERE tf >= '08:00:00' AND tf <= '12:00:00'

Brute force method13245 Swift requests

Optimized predicate pushdown616 Swift requests

21.5 times improvement

Swift

Analytics Use Case

IoT Analytics Use Case Example Metadata

IoT Use Case - EMT Madrid Bus Service

▪ Search capability allows understanding traffic at a

given time slot, helps plan better for future events

▪ Historical Data about bus trips - generated by IoT

devices mounted on the EMT Buses

▪ Data ingested into Object Store, along with relevant

metadata

14

Data Collected from EMT Buses

15

Kafka + Secor

Groups into objects, uploads at regular intervals

Storletsgeneratemetadata

1. Storlet converts GPS coordinates from UTM to lat,long

2. Storlet calculates GPS bounding box and stores as metadata

Bus Data continuously uploaded to Object Store

16

17

Demo

17

Behind the Scenes of Metadata Search

18

▪Metadata search involves two flows:

▪ Indexing objects’ metadata

▪ Serving search queries

18

Indexing Objects’ Metadata

19 19

Storage System input data path

Indexing Objects’ Metadata

20 20

Storage System input data path Indexer

Indexing Objects’ Metadata

21 21

Storage System input data path

Queue

Index / SearchIndex /

Search

Indexer

Indexing Objects’ Metadata

22 22

Swift Proxy pipeline Swift Storage Tier

Rabbit

Elastic SearchElastic

Search

Indexer Middleware

Serving Search Requests

23 23

Swift Proxy pipeline

Elastic SearchElastic

Search

MD SearchMiddleware

Swift Object Store

ProxyService

StorageNodes

Indexer

Swift ProxyNodes

StorageNodes

Swift StorageNodes

HTTP SwiftRequests

Load Balancer

Overall Architecture

24

Search

...Rabbit

ProxyService

Indexer

Search

Rabbit

ElasticSearch Cluster

Example:

GET http://iotserver.example.com/v1/AUTH_...2357c/busData?

query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND

X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]

X-Context: search

Query API

Example:GET http://iotserver.example.com/v1/AUTH_...2357c/busData?

query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND

X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]

▪Query Features:1. Multiple criteria possible2. Supports various operators

• =, !=, <, <=,in,~,...3. Supports metadata data types

• strings, integers, floats, dates, geo-points, free text• Allows comparisons and range searches

Query API

Where Do We Go From Here?

▪Extend to support File-based (NFS/SMB) attributes▪Standardize Search API▪Standardize back-end APIs to allow support for any queuing and/or database systems▪Work on visualizing information through Kibana, etc▪Collaborate with OpenStack Community Efforts▪ Swift Event Notification Mechanism▪ OpenStack Searchlight

■ Also built on Elastic Search and RabbitMQ■ Work to standardize search API

27

Spectrum Scale Object Store

ProxyService

ObjectService

SpectrumScale

ObjectService

SpectrumScale

..

.Keystone

AuthenticationService

SwiftServices

AdditionalServices in

Cluster

Metadata Index DB

Search and SwiftRequestsLoad

Balancer

Will be Available with IBM Spectrum Scale - 4Q15

ES

ProxyService

Middleware

RMQ

28

Middleware

ES RMQ

1.Pre-installed and configured Virtual Appliance

2.Roll-your-own solution○ White Paper to be

released describing how to setup and configure

○ Will include a source tarball

○ Fine tune as per your requirements

29 29