When Ceph Meets SPDK
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of When Ceph Meets SPDK
研发核心
• 团队A:来自一线互联网,国内Ceph社区主要贡献者
• 团队B:来自IT领导厂商的存储产品研发
关于 XSKY | 星辰天合
• 2015年5月成立,总部位于北京,深圳拥有研发机构
• 北极光创投和红点投资,A轮前融资额¥7200万
• 员工70+人,研发+服务~50人
• 公司愿景:提供企业就绪的分布式软件定义存储产品,帮
助客户实现数据中心架构革新。
• 产品:X-EBS分布式块存储,X-CBS云计算后端存储,
X-EOS分布式对象存储等
关于我们
Future Ready SDS
About Ceph
http://www.revolvermaps.com/?target=enlarge&i=1lzi710tj7s&color=80D2DC&m=0
What is Ceph?
• Object, block, and file storage in a single cluster
• All components scale horizontally
• No single point of failure
• Hardware agnostic, commodity hardware
• Self-manage whenever possible
• Open source (LGPL)
• “Ceph is a distributed object store and file system designed
to provide excellent performance, reliability and scalability.”
Background
• Low performance of Ceph’s storage service
• Ceph’s architecture is designed on the low storage devices (with ms latency level)
• There are more and more fast devices in both network and storage
• Network: 10G/25G/40G/100G (low performance -> high performance)
• Storage: HDD -> SATA SSD -> PCIe SSD -> NVDIMM (high latency -> low latency)
• Challenge: Software design and implementation in Ceph is the bottleneck
• Equipped with those fast devices, Software needs to be refreshed to explore the
limitation of those hardware devices.
Potential solutions
Invent a new ObjectStore/FileStore design and implementation in the followingaspects
• API Change• Synchronous APIs -> Asynchronous APIs (POSIX -> NON-POSIX)
• Benefit: Obtaining performance via completing several requests instead of one.
• I/O stack optimization: • Replace Kernel I/O stacks with user space stacks (e.g., Network I/O, Storage I/O )
• Benefit: No context switch, no data copy among kernel and user space, locked
architecture -> unlocked architecture
• SPDK (storage performance development kit, https://www.spdk.io/) provides a set of libraries to address such issues.
SPDK Introduction
Built on Intel® Data Plane Development Kit (DPDK)
Software infrastructure to accelerate the packet input/output to Intel CPU
User space Network Services (UNS)
TCP/IP stack implemented as polling, lock-light library, bypassing kernel bottlenecks, and
enabling scalability
User space NVMe, Intel® Xeon®/Intel®Atom™Processor DMA,
and Linux* AIO drivers
Optimizes back end driver performance and prevents kernel bottlenecks from forming at the
back end of the I/O chain
*Other names and brands may be claimed as the property of others.
SPDK architecture Overview
Extends Data Plane Development Kit concepts through an end-to-end storage context• Optimized, user-space lockless polling in the NIC driver, TCP/IP stack, iSCSI target, and NVMe driver
• iSCSI and NVMe over Fabrics targets integrated
Exposes the performance potential of current and next-generation storage media• Media latencies moving from low-μsec to nsec, storage software architectures must keep up
• Permissive open source license for user-space media drivers: NVMe & CBDMA drivers are on github.com
• Media drivers support both Linux* and FreeBSD*
NVMf Application and Protocol Library:• Provisioning, Fabric Interface Processing , Memory Allocation, Fabric Connection Handling, RDMA Data Xfer
• Discovery, Subsystems, Logical Controller, Capsule Processing, Manage Interface with NVMe Driver library
*Other names and brands may be claimed as the property of others.
https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-development-kit-spdk
4 KB Random Read Performance: 4 x NVMe DrivesSingle-Core Intel®Xeon® Processor
SPDK NVMe driver delivers up to 6x performance improvement
vs. Kernel NVMe driver with a single-core Intel®Xeon®processor
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary.
You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products. For more information go to http://www.intel.com/performance.
From 10/28 spdk meetup
4 KB Random Read Performance: 1-4 NVMe DrivesSingle-Core Intel®Xeon®Processor
SPDK NVMe driver scales linearly in performance
from 1 to 4 NVMe drives with a single-core Intel®Xeon®processor
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary.
You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products. For more information go to http://www.intel.com/performance.
From 10/28 spdk meetup
What SPDK can do to improve Ceph?
• Accelerate the backend I/Os in Ceph OSD (Object storage service)
• Key solution: Replace the Kernel drivers with user space NVMe drivers
provided by SPDK to accelerate the I/Os on NVMe SSDs.
• Accelerate the network performance (TCP/IP) in Ceph’s internal
network.
• Key solution: Replace the existed network solution provided by kernel in each
OSD Node with DPDK + User space TCP/IP stack (e.g., LIBUNS, SEASTAR,
MTCP and etc.).
BlueStore
• BlueStore = Block + NewStore
Consume raw block device(s)
key/value database (RocksDB) for metadata
data written directly to block device
pluggable block Allocator
ObjectStore
BlueStore
BlockDevice
Device
Kernel Driver Userspace NVMe Driver
RocksDB
BlueRocksEnv
BlueFS
Allocator
metadatadata
DPDK-Messenger Plugin
Posix
Worker
Posix
Worker
Posix
Worker
Posix
Worker
KernelDPDK
Worker
DPDK
Worker
DPDK
Worker
DPDK
Worker
PosixStack DPDKStack
NetworkStack
AsyncMessenger
AsyncConnection
DPDK
Worker
TCP
IP
ARP
RTE
DPDK PMD
=
DPDK-Messenger Design
• TCP, IP, ARP, DPDKDevice:
• hardware features offloads
• port from seastar tcp/ip stack
• integrated with ceph’s libraries
• Event-drive:
• UserspaceEvent Center(like epoll)
• NetworkStack API:
• Basic Network Interface With Zero-copy or Non Zero-copy
• Ensure PosixStack<-> DPDKStack Compatible
• AsyncMessenger:
https://github.com/ceph/ceph/pull/10748
DPDK-Messenger Open Source
NVMe Device
• Status
Userspace NVMe Library(SPDK)
Already in Ceph master branch
DPDK integrated
IO Data From NIC(DPDK mbuf) To Device
Details
Posix
Worker
Posix
Worker
Posix
Worker
Posix
Worker
KernelDPDK
Worker
DPDK
Worker
DPDK
Worker
DPDK
Worker
PosixStack DPDKStack
PG PG PG PG PG
Bluestore
NetworkStack
AsyncMessenger
AsyncConnection
KernelDevice NVMeDevice
Device
Userspace nvme driver Open Source
https://github.com/ceph/ceph/pull/7145
Summary
• There are performance issues in Ceph with the emerging fast network and storage
devices.
Storage system need to refactor to catch up hardware.
Ceph is hoped to change to share-nothing implementation.
• Mainly, We introduce SPDK and Bluestore to address the current issues in Ceph.
SPDK: Libraries (e.g., user space NVMe driver) can be used for performance
acceleration.
BlueStore: Invent a new store to implement lockless, asynchronous and high
performance storage service.
• Lots of details need to work(coming soon)