Hydra FS High Throughput FS
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Hydra FS High Throughput FS
HydraFS: a High-Throughput File System for the HYDRAstorContent-Addressable Storage System
10/28/22 1
Authors•Cristian Ungureanu
•Benjamin Atkin•Akshat Aranya•Salil Gokhale
•Stephen Rago•Grzegorz Calkowski
•Cezary Dubnicki•Aniruddha Bohar
10/28/22
2
Contents• HYDRAstor Introduction• Content-Addressable Storage (CAS)• Problems and Solutions for CAS• Challenges for the new implementation• Filesystem layout and software architecture• Read/write processing, metadata cleaning, deletion
• Conclusion
10/28/22
3
HYDRAstor Introduction•Multi-node CAS system•Stores blocks at configurable redundancy levels
•Supports high-throughput R/W for large blocks•Use for:
•Backup solutions•DR solutions•Data archive
10/28/22
4
NEC HYDRAstor•HYDRAstor architecture introduced in March 2007
•Conducts R&D in the USA, Japan, Germany, China
•High performance, capacity-optimized, and highly available storage solutions, for Enterprise backup, long-term data archive, and DR solutions
10/28/22
5
Competitors•EMC Corporation
• Centera, DataDomain
•Hewlett-Packard• Information Access Platform (RISS)
•Hitachi Data Systems• Content Archive Platform (HCAP)
•IBM• DR550
10/28/22
6
Content-Addressable Storage•A Content-Addressable Storage (CAS): •Elimination of duplicate blocks •Gives high throughput for streaming access
•Once saved an object, can’t be deleted until retention expired
10/28/22
7
Problems with CAS System•Absence of standardized API,•makes barrier to use of CAS with existing applications
•Have to rewrite applications to use with the CAS-specific API
•Applications have to deal with unique characteristics of CAS. • Immutability of blocks•High latency on operations
10/28/22
8
Solution•Built HydraFS filesystem on top of CAS system making an interface
•Support distributed CAS systems
10/28/22
9
CAS System
HydraFS
App 1 App 2 App 3
Solution cont…•HydraFS presents standard interface to effective use of CAS system without requiring changes in applications
•Gain storage performance by mapping best access patterns of the applications
•HydraFS increases R/W performance of HYDRAstor by 82–100%
10/28/22
10
Challenges•Immutable block size•High latency•Chunking algorithm to determine block boundaries
10/28/22
11
Filesystem Design Achievements•High throughput for sequential R/W
•Minimized dependent I/O•Guarantees data availability•Supports both local & remote FS access
10/28/22
12
Filesystem Layout
10/28/22
13
•Designed as a DAG•Root stores searchable block which holds super block
•Superblock holds imap and current FS version
Software Architecture
10/28/22
14
•Implemented as user-level processes
•Fileserver manages FS interface
•Commit server generates new FS version
Write Processing•Fileserver buffers file write data•Apply content-defined chunking algorithm
•Create new variable size blocks, marked ‘dirty’
•Write dirty block to disks•Uncommitted block table – modified FS metadata
10/28/22
15
Metadata Cleaning•Commit server creates FS new super block,
•After that, fileserver can clean its dirty metadata
•Generates new version of the FS
10/28/22
16
Admission Control•Mechanism to control memory usage•Define memory for objects, Ex: data blocks, inode
•New events are allowed when memory available, otherwise event blocks
10/28/22
17
Read Processing•Get read-ahead metadata into in-memory
•Content addresses of this range retrieve from inode’s B-tree
10/28/22
18
Deletion•Remove the pointer to the data block from namespace. Storage space is left
•After new FS created, old version marked for deletion
•Number of retain FS ver. are configurable
10/28/22
19
Conclution•HydraFS outcome:•High throughput RW access•High duplicate elimination rate
•Evaluation results:•Efficient and support up to 82% for read and 100% for writes
•Best suitable for:•Backup applications and repositories
10/28/22
20