HADOOP BASIC CONCEPTS AND HDFS
-
Upload
andhrauniversity -
Category
Documents
-
view
0 -
download
0
Transcript of HADOOP BASIC CONCEPTS AND HDFS
Hadoop Basic Concepts and HDFS
In this chapter you will learn
What Hadoop is
What features the Hadoop Distributed File System (HDFS) provides
Hadoop Basic Concepts and HDFS
The Hadoop Project and Hadoop Components
!! The Hadoop Distributed File System (HDFS)
!! Hands/On Exercise: Using HDFS
!! Conclusion
Core Components: HDFS and MapReduce
! HDFS (Hadoop Distributed File System)
– Stores data on the cluster
! MapReduce
– Processes data on the cluster
A Simple Hadoop Cluster
! A Hadoop cluster: a group of machines working together to store and process data
! Any number of ‘slave’ or ‘worker’ nodes
– HDFS to store data
– MapReduce to process data
! Two ‘master’ nodes
– Name Node: manages HDFS
– Job Tracker: manages MapReduce
Hadoop Basic Concepts and HDFS
The Hadoop Project and Hadoop Components
The Hadoop Distributed File System (HDFS)
!! Hands/On Exercise: Using HDFS
!! Conclusion
HDFS Basic Concepts
! HDFS is a filesystem written in Java
– Based on Google’s GFS
! Sits on top of a naive filesystem
– Such as ext3, ext4 or xfs
! Provides redundant storage for massive amounts of data
– Using readily/available, industry/standard computers
How Files Are Stored
! Data files are split into blocks and distributed at load :me
! Each block is replicated on multiple data nodes (default 3x)
! NameNode stores metadata
HDFS NameNode Availability
! The NameNode daemon must be running at all times
– If the NameNode stops, the cluster becomes inaccessible
! High Availability mode (in CDH4 and later)
– Two NameNodes: Active and Standby
! Classic mode
– One NameNode
– One “helper” node called SecondaryNameNode
– Bookkeeping, not backup
Options for Accessing HDFS
! FsShell Command line: hadoop fs!
! Java API
! Ecosystem Projects
– Flume
Collects data from network sources
(e.g., system logs)
– Sqoop:
Transfers data between HDFS
and RDBMS
– Hue
Web/based interactive UI.
Can browse, upload, download, and view files
Hadoop Basic Concepts and HDFS
The Hadoop Project and Hadoop Components
The Hadoop Distributed File System (HDFS)
Hands/On Exercise: Using HDFS
!! Conclusion
Hands-on Exercise: Using HDFS
! In this Hands-On Exercise you will begin to get acquainted with the Hadoop tools. You will manipulate files in HDFS, the Hadoop Distributed File System
! Please refer to the Hands-On Exercise Manual
Hadoop Basic Concepts and HDFS
The Hadoop Project and Hadoop Components
The Hadoop Distributed File System (HDFS)
Hands/On Exercise: Using HDFS
Conclusion
Key Points
! The core components of Hadoop
– Data storage: Hadoop Distributed File System (HDFS)
– Data processing: MapReduce
! How HDFS works
– Files are divided into blocks
– Blocks are replicated across nodes
! Command line access to HDFS
– FsShell: hadoop fs
– Sub/commands: -get, -put, -ls, -cat, etc.