Implementation of a Motion Detection System
-
Upload
independent -
Category
Documents
-
view
4 -
download
0
Transcript of Implementation of a Motion Detection System
WASET Nov. 2013, Malaga · Large Scale Systems
Architecturesof
Large Scale Systems (LSS)
Prof. Arne KoschelIrina Astrova, Elena Deutschkämer, Jacob Ester and Johannes Feldmann
WASET Nov. 2013, Malaga · Large Scale Systems
Agenda• Introduction and definition• LSS Hard- and software• Components
– Distributed file systems– Databases– Hadoop / Map Reduce – Caching
• Practical example– Facebook
• Conclusion
WASET Nov. 2013, Malaga · Large Scale Systems
What means „large scale“?• No single exact definition• Criteria may be
– amount of data processed– number of hardware elements– number of people involved– number of system purposes and processes
• Problems– performance, reliability, complexity, development process
WASET Nov. 2013, Malaga · Large Scale Systems
Differences to traditional systemsCharacteristic Traditional IT-System Large scale systemGovernance Singular dominant
influenceMultiple, conflicting influences
Duration of life
Defined at the moment of designing
Infinite
Flow of information
Well-understood internal flow, known sources
Changing flow of information, new sources
Complexity Optimized Highly complex, not optimized
Elements Services, components Systems, services
WASET Nov. 2013, Malaga · Large Scale Systems
Differences to traditional systems
Traditional software system
Large scale system = System of systems
WASET Nov. 2013, Malaga · Large Scale Systems
Scalability• Vertical scalability (scale up)
– Replace system through a more powerful system
• Horizontal scalability (scale out)– Adding extra server(s) to the system– Of interest for large scale systems
Server
Server
Server Serve
rServer
Server
WASET Nov. 2013, Malaga · Large Scale Systems
LSS Hard- and software• Hardware
– Research in specialized hardware• Open Computer Project
– Instructions and specifications for constructing very efficient servers
– On the market• Scalable Servers by Intel and HP
• Software– Various software approaches for LSS (especially for web based LSS)• Frameworks and algorithms designed for LSS• Lots of different databases and file-systems• Caching mechanism
WASET Nov. 2013, Malaga · Large Scale Systems
Distributed File Systems• Provide data access to many clients• Use a network and an access protocol• May offer mechanisms for replication and fault tolerance
• Highly optimized in LSS– Google File System, Amazon S3, Facebook Haystack, etc.
WASET Nov. 2013, Malaga · Large Scale Systems
Example: Google File System (GFS)• Assumptions made
– Commodity hardware that is expected to fail– Huge files (multi-GB)– workload consists of ...
•sequential or random reads•mostly sequential writes (streams), no random writes
•practically no over-writing– Bandwidth is more important than low latency
WASET Nov. 2013, Malaga · Large Scale Systems
GFS features and benefits• Seperation of data flow and control data• Metadata (addressing) are in-memory on master side
• Large chunks (64 MB)• Relaxed consistency model
– Atomic writes and appends, garbage collection• High availablity
– balancing, fast recovery, master and chunk replication, snapshots,
WASET Nov. 2013, Malaga · Large Scale Systems
• Main Types (Core NoSQL)– Column Store
• Any key to any key-value-pairs Column Family
– Document Store• Structured data collections like JSON
– Key/Value Store• Schema of key and value (Strings, Hashes, Sets, Lists)
– Graph DBs
Databases
WASET Nov. 2013, Malaga · Large Scale Systems
• Column store• Column Families = set of column keys• Tablet = row range• 3 main components
– library, master server, tablet server• Scalability
– dynamic adding of tablet servers• Proprietary database by Google
– Open-source solution: Hbase
Example: BigTable
WASET Nov. 2013, Malaga · Large Scale Systems
Hadoop• Free, java-based Framework for LSS• Inspired by Google’s MapReduce and Google File System (GFS)• Main contents of Hadoop-Framework
– Hadoop Common• Provides access to the supported file systems
– HDFS, Amazon S3, CloudStore, FTP file system, Read Only HTTP and HTTPS file systems
• Necessary Jar-Files and scripts– Hadoop Distributed File System (HDFS)
• Distributed, scalable portable file system• On default, data is stored on three nodes: two on the same rack and
one on a different rack– Hadoop MapReduce
• Consists of one Job Tracker (get the Map Reduce Jobs) and lots of Task Tracker nodes
– Job Tracker knows, which node contains the data and which Task Tracker is nearby (keep the work as close to the data as possible)
• Uses Map() and Reduce() Functions next slides
WASET Nov. 2013, Malaga · Large Scale Systems
MapReduce-Algorithm• Uses a list of key-value-pairs to calculate a new list of key-value-pairs
• Map-Function– Job-tracker partitions the input in smaller sub-problems and distribute them to the task-trackers
– Writes the results into a intermediate storage• Reduce-Function
– Gets the results from the intermediate storage– Calculates the final result of the main-problem
)],(),...,,[()],(),...,,[(*)()*(
1111 nnnn wlwlvkvkWLVK
)],(),...,,[(),(*)(
11 rkrk xlxlvkWLVK
],...,[]),...,[,(**
11 mlsl wwyylWWL
WASET Nov. 2013, Malaga · Large Scale Systems
Map-ReduceDa
ta
Split4Split5Split6
Workermap()
Workermap()
Workermap()
FileFile
FileFile
FileFile
Workerreduce()
Workerreduce()
File
File
Split0Split1Split2Split3
Input-Data Map-stage Tmp storage Reduce-stage Result
WASET Nov. 2013, Malaga · Large Scale Systems
HDFS• HDFS (Hadoop distributed file system)
– Consists of one name node and many data nodes•Name node stores the metadata of the HDFS•Data nodes store the data
– Cluster of data nodes forms the HDFS cluster
– Uses TCP/IP layer and RPC to communicate– Replicates data across multiple hosts
WASET Nov. 2013, Malaga · Large Scale Systems
HDFS Architecture
Task Tracke
r
Data node
job tracker
Name node
Task tracker
Data node
MapReduce LayerHDFS Layer
Task Tracke
r
Data node
Task Tracke
r
Data node
Task Tracke
r
Data node
[…]
Master
Slave Slave Slave Slave
WASET Nov. 2013, Malaga · Large Scale Systems
Caching• Store frequently accessed data in fast memory near target location
• Main goal in web applications: avoid database access
• Highly efficient when results are allowed to be slightly ‘out of date’
WASET Nov. 2013, Malaga · Large Scale Systems
Memcached• „short-term memory for applications“ • Main idea: provide one single cache used by several web servers instead of may independent server-related caches
WASET Nov. 2013, Malaga · Large Scale Systems
Memcached – key features• Free and open source www.memcached.org• Implemented as hash table, distributed across multiple machines
• Client / Server architecture• APIs and integration available for many languages
• Seperates server and memory units• Limitations
– no permanent persistence– no complex queries
WASET Nov. 2013, Malaga · Large Scale Systems
Facebook• Facts
– 200 billion page calls per month– 15.000 websites use Facebook Connect– 9,5% of worldwide internet traffic– 9 Datacenters
• Scales across multiple Datacenters– 60.000 Servers (June 2010)– Based on LAMP (Linux, Apache, MySQL, PHP)– One of the world biggest MySQL cluster– Some PHP functions are converted into faster C++– Main framework uses RPCs for additional services and extensions• Services use Hadoop, Cassandra, Hive, Scribe, …
WASET Nov. 2013, Malaga · Large Scale Systems
But ...–LAMP is not perfect at all
• PHP is stateless• Data is remote
Services– Store code closer to data– Compiled environment is more efficient
• Facebook Messaging (Chat, Status Updates, Messages, E-Mail, SMS)– Some parts are written in Erlang
– Uses Hadoop/HDFS• More than 3200 jobs/day• 800.000 tasks(MapReduce)/day
• Scans 55TB data per day• 15TB of compressed output data to HDFS
High-level architecture
PHPMemcachedMySQL
AdServerSearchNetworkNewsfeedBlogfeedCSSParserMobile
Lamp + Services
ThriftScribe
…
• PHP– Good library support for web
applications– Active developer community– Good for rapid iteration
• Memcached– Used to reduce database load– More than 25TB in memory cache– Uses UDP
• Reduces overhead from TCP connection buffers
• Application-level flow control, sequenzing…
• MySQL– Mostly all data is identified by GUID– Load balancer at physical node level
• More than 500 physical db nodes– Extended query engine for cross
datacenter replication
WASET Nov. 2013, Malaga · Large Scale Systems
Conclusion• Large scale systems …
– exceed traditional applications in various dimensions
– are systems of systems– combine various technologies to unique, adopted solutions
– choice of technologies depends on requirements• Architecture enables dynamic growth of the entire system