Implementation of a Motion Detection System

32
WASET Nov. 2013, Malaga · Large Scale Systems Architectures of Large Scale Systems (LSS) Prof. Arne Koschel Irina Astrova, Elena Deutschkämer, Jacob Ester and Johannes Feldmann

Transcript of Implementation of a Motion Detection System

WASET Nov. 2013, Malaga · Large Scale Systems

Architecturesof

Large Scale Systems (LSS)

Prof. Arne KoschelIrina Astrova, Elena Deutschkämer, Jacob Ester and Johannes Feldmann

WASET Nov. 2013, Malaga · Large Scale Systems

Agenda• Introduction and definition• LSS Hard- and software• Components

– Distributed file systems– Databases– Hadoop / Map Reduce – Caching

• Practical example– Facebook

• Conclusion

WASET Nov. 2013, Malaga · Large Scale Systems

INTRODUCTION & DEFINITION

WASET Nov. 2013, Malaga · Large Scale Systems

What means „large scale“?• No single exact definition• Criteria may be

– amount of data processed– number of hardware elements– number of people involved– number of system purposes and processes

• Problems– performance, reliability, complexity, development process

WASET Nov. 2013, Malaga · Large Scale Systems

Differences to traditional systemsCharacteristic Traditional IT-System Large scale systemGovernance Singular dominant

influenceMultiple, conflicting influences

Duration of life

Defined at the moment of designing

Infinite

Flow of information

Well-understood internal flow, known sources

Changing flow of information, new sources

Complexity Optimized Highly complex, not optimized

Elements Services, components Systems, services

WASET Nov. 2013, Malaga · Large Scale Systems

Differences to traditional systems

Traditional software system

Large scale system = System of systems

WASET Nov. 2013, Malaga · Large Scale Systems

Scalability• Vertical scalability (scale up)

– Replace system through a more powerful system

• Horizontal scalability (scale out)– Adding extra server(s) to the system– Of interest for large scale systems

Server

Server

Server Serve

rServer

Server

WASET Nov. 2013, Malaga · Large Scale Systems

HARD- AND SOFTWARE

WASET Nov. 2013, Malaga · Large Scale Systems

LSS Hard- and software• Hardware

– Research in specialized hardware• Open Computer Project

– Instructions and specifications for constructing very efficient servers

– On the market• Scalable Servers by Intel and HP

• Software– Various software approaches for LSS (especially for web based LSS)• Frameworks and algorithms designed for LSS• Lots of different databases and file-systems• Caching mechanism

WASET Nov. 2013, Malaga · Large Scale Systems

COMPONENTSDistributed File Systems

WASET Nov. 2013, Malaga · Large Scale Systems

Distributed File Systems• Provide data access to many clients• Use a network and an access protocol• May offer mechanisms for replication and fault tolerance

• Highly optimized in LSS– Google File System, Amazon S3, Facebook Haystack, etc.

WASET Nov. 2013, Malaga · Large Scale Systems

Example: Google File System (GFS)• Assumptions made

– Commodity hardware that is expected to fail– Huge files (multi-GB)– workload consists of ...

•sequential or random reads•mostly sequential writes (streams), no random writes

•practically no over-writing– Bandwidth is more important than low latency

WASET Nov. 2013, Malaga · Large Scale Systems

GFS Architecture

WASET Nov. 2013, Malaga · Large Scale Systems

GFS features and benefits• Seperation of data flow and control data• Metadata (addressing) are in-memory on master side

• Large chunks (64 MB)• Relaxed consistency model

– Atomic writes and appends, garbage collection• High availablity

– balancing, fast recovery, master and chunk replication, snapshots,

WASET Nov. 2013, Malaga · Large Scale Systems

COMPONENTSDatabases

WASET Nov. 2013, Malaga · Large Scale Systems

• Main Types (Core NoSQL)– Column Store

• Any key to any key-value-pairs Column Family

– Document Store• Structured data collections like JSON

– Key/Value Store• Schema of key and value (Strings, Hashes, Sets, Lists)

– Graph DBs

Databases

WASET Nov. 2013, Malaga · Large Scale Systems

• Column store• Column Families = set of column keys• Tablet = row range• 3 main components

– library, master server, tablet server• Scalability

– dynamic adding of tablet servers• Proprietary database by Google

– Open-source solution: Hbase

Example: BigTable

WASET Nov. 2013, Malaga · Large Scale Systems

COMPONENTSHadoop / Map-Reduce

WASET Nov. 2013, Malaga · Large Scale Systems

Hadoop• Free, java-based Framework for LSS• Inspired by Google’s MapReduce and Google File System (GFS)• Main contents of Hadoop-Framework

– Hadoop Common• Provides access to the supported file systems

– HDFS, Amazon S3, CloudStore, FTP file system, Read Only HTTP and HTTPS file systems

• Necessary Jar-Files and scripts– Hadoop Distributed File System (HDFS)

• Distributed, scalable portable file system• On default, data is stored on three nodes: two on the same rack and

one on a different rack– Hadoop MapReduce

• Consists of one Job Tracker (get the Map Reduce Jobs) and lots of Task Tracker nodes

– Job Tracker knows, which node contains the data and which Task Tracker is nearby (keep the work as close to the data as possible)

• Uses Map() and Reduce() Functions next slides

WASET Nov. 2013, Malaga · Large Scale Systems

MapReduce-Algorithm• Uses a list of key-value-pairs to calculate a new list of key-value-pairs

• Map-Function– Job-tracker partitions the input in smaller sub-problems and distribute them to the task-trackers

– Writes the results into a intermediate storage• Reduce-Function

– Gets the results from the intermediate storage– Calculates the final result of the main-problem

)],(),...,,[()],(),...,,[(*)()*(

1111 nnnn wlwlvkvkWLVK

)],(),...,,[(),(*)(

11 rkrk xlxlvkWLVK

],...,[]),...,[,(**

11 mlsl wwyylWWL

WASET Nov. 2013, Malaga · Large Scale Systems

Map-ReduceDa

ta

Split4Split5Split6

Workermap()

Workermap()

Workermap()

FileFile

FileFile

FileFile

Workerreduce()

Workerreduce()

File

File

Split0Split1Split2Split3

Input-Data Map-stage Tmp storage Reduce-stage Result

WASET Nov. 2013, Malaga · Large Scale Systems

HDFS• HDFS (Hadoop distributed file system)

– Consists of one name node and many data nodes•Name node stores the metadata of the HDFS•Data nodes store the data

– Cluster of data nodes forms the HDFS cluster

– Uses TCP/IP layer and RPC to communicate– Replicates data across multiple hosts

WASET Nov. 2013, Malaga · Large Scale Systems

HDFS Architecture

Task Tracke

r

Data node

job tracker

Name node

Task tracker

Data node

MapReduce LayerHDFS Layer

Task Tracke

r

Data node

Task Tracke

r

Data node

Task Tracke

r

Data node

[…]

Master

Slave Slave Slave Slave

WASET Nov. 2013, Malaga · Large Scale Systems

COMPONENTSCaching

WASET Nov. 2013, Malaga · Large Scale Systems

Caching• Store frequently accessed data in fast memory near target location

• Main goal in web applications: avoid database access

• Highly efficient when results are allowed to be slightly ‘out of date’

WASET Nov. 2013, Malaga · Large Scale Systems

Memcached• „short-term memory for applications“ • Main idea: provide one single cache used by several web servers instead of may independent server-related caches

WASET Nov. 2013, Malaga · Large Scale Systems

Memcached – key features• Free and open source www.memcached.org• Implemented as hash table, distributed across multiple machines

• Client / Server architecture• APIs and integration available for many languages

• Seperates server and memory units• Limitations

– no permanent persistence– no complex queries

WASET Nov. 2013, Malaga · Large Scale Systems

PRACTICAL EXAMPLESFacebook

WASET Nov. 2013, Malaga · Large Scale Systems

Facebook• Facts

– 200 billion page calls per month– 15.000 websites use Facebook Connect– 9,5% of worldwide internet traffic– 9 Datacenters

• Scales across multiple Datacenters– 60.000 Servers (June 2010)– Based on LAMP (Linux, Apache, MySQL, PHP)– One of the world biggest MySQL cluster– Some PHP functions are converted into faster C++– Main framework uses RPCs for additional services and extensions• Services use Hadoop, Cassandra, Hive, Scribe, …

WASET Nov. 2013, Malaga · Large Scale Systems

But ...–LAMP is not perfect at all

• PHP is stateless• Data is remote

Services– Store code closer to data– Compiled environment is more efficient

• Facebook Messaging (Chat, Status Updates, Messages, E-Mail, SMS)– Some parts are written in Erlang

– Uses Hadoop/HDFS• More than 3200 jobs/day• 800.000 tasks(MapReduce)/day

• Scans 55TB data per day• 15TB of compressed output data to HDFS

High-level architecture

PHPMemcachedMySQL

AdServerSearchNetworkNewsfeedBlogfeedCSSParserMobile

Lamp + Services

ThriftScribe

• PHP– Good library support for web

applications– Active developer community– Good for rapid iteration

• Memcached– Used to reduce database load– More than 25TB in memory cache– Uses UDP

• Reduces overhead from TCP connection buffers

• Application-level flow control, sequenzing…

• MySQL– Mostly all data is identified by GUID– Load balancer at physical node level

• More than 500 physical db nodes– Extended query engine for cross

datacenter replication

WASET Nov. 2013, Malaga · Large Scale Systems

CONCLUSION

WASET Nov. 2013, Malaga · Large Scale Systems

Conclusion• Large scale systems …

– exceed traditional applications in various dimensions

– are systems of systems– combine various technologies to unique, adopted solutions

– choice of technologies depends on requirements• Architecture enables dynamic growth of the entire system