Persistent memory

33
Persistent Memory Dr. Benoit Hudzia @blopeur [email protected]

Transcript of Persistent memory

Page 1: Persistent memory

Persistent MemoryDr. Benoit Hudzia

@[email protected]

Page 2: Persistent memory

Agenda

NVM Evolution

Persistent Memory Linux Software Stack

Using , Emulating PMEM on Linux

Remote PMEM

Micro Storage Architecture

Page 3: Persistent memory

NVM Evolution

Page 4: Persistent memory

Persistent MemoryYesterday : Battery Backed RAM

Today : NVDIMM with RAM + FLASH

Power Down - copy to Flash, Power Up copy Back to RAM

Emerging NVDIMM : PCM - 3DX Point - Memristor - etc…

Offer 1000x speed vs NAND -> closer to RAM

Characteristics as seen by software : Synchronous Model

Load / Store memory instruction

No paging

Reasonably stall CPU

Page 5: Persistent memory

New Generation HW NVM is no longer the bottleneck

But still limited by Block stack latency + Asynchronous Model

Page 6: Persistent memory

Asynchronous Model : NVMe

“When Poll is Better than Interrupt” Yang & Al . Usenix Fast 2012 https://www.usenix.org/legacy/events/fast12/tech/full_papers/Yang.pdf

● Active Polling ( SYNC ) lower latency ( at the expense of CPU) vs interrupt MSI-X (ASYNC)

● Used in Intel SPDK

Page 7: Persistent memory

Enter persistent Memory

Source: Intel4KBRead

64BRead

Page 8: Persistent memory

Moving away from Block I/O

LATENCY

ACCESS

Page 9: Persistent memory

Lead to a new Tiered Software Stack

Page 10: Persistent memory

Challenge: Durability

Page 11: Persistent memory

PMEM Linux Software Stack

Page 12: Persistent memory

Linux kernel (>4.2) subsystem

Page 13: Persistent memory

NVDIMM Software Architecture

http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

Page 14: Persistent memory

BTT vs DAXBTT : Block translation table

provides atomic sector update semantics for persistent memory devices

applications that rely on sector writes not being torn can continue to do so.

For Legacy application

DAX : stands for Direct Access

Allows mapping a pmem range directly into userspace via mmap

If the application is aware of persistent, byte-addressable memory, and can use it to an advantage, DAX is the best path for it

If the application relies on atomic sector update semantics, it must use the BTT

Note that PMEM page are not backed by Page struct , only by PFN (so far)

Page 15: Persistent memory

Using , Emulating PMEM on Linux

Page 16: Persistent memory

Kernel Config ( > 4.2 )

Enable NVDIMM dynamic debug before you start playing with NVDIMMsAdd to the kernel cmd line:libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_blk.dyndbg ignore_loglevel

Page 17: Persistent memory

Pick your PMEMUse ACPI 6.0 compatible NVDIMM hardware or

legacy NVDIMMs

Use virtual NVDIMMs provided by hypervisor

RAM as persistent memory

PCMSIM: NVM-disk Emulation

Page 18: Persistent memory

Emulation : RAM as PMEMBare metal :

Add 'memmap=16G!16G' to the kernel boot parameters will reserve 16G of memory, starting at 16G.

cat /proc/cmdline :

BOOT_IMAGE=/boot/vmlinuz-4.3.0-1-default root=UUID=39635fd6-64ee- 4538-9964-7de6bb181181 resume=/dev/sda1 splash=silent quiet showopts memmap=1G!5G memmap=1G!7G

BTT works

Page 19: Persistent memory

QEMU NVDIMMQemu :

qemu-system-x86_64 -object memory-backend-file,share,id=mem1,mem-path=/dax/D1 -device nvdimm,memdev=mem1,reserve-label-data,id=nv1 -m 2048,maxmem=100G,slots=10 ….

Not yet in Upstream Qemu :

https://github.com/xiaogr/qemu/tree/nvdimm-v9

Seabios integration :

http://www.seabios.org/pipermail/seabios/2015-September/009770.html

Still Missing some feature + high overhead for some operations

Supports PMEM only -> Good for NFIT dev

Page 20: Persistent memory

Playing with DAXOnly ext2, ext4 and xfs currently support DAX

Note that block size should match page size

mkfs.ext4 -b 4096 /dev/pmem1

mount -t ext4 -o dax /dev/pmem1 /tmp/dax/

Page 21: Persistent memory

Playing with DAX - Cont

Then you just have to mmap it!

But remember: CFLUSH, etc.. for durability

Page 22: Persistent memory

NVML : Lets somebody else do the heavy lifting

http://pmem.io/

libpmem – Basic persistency handling

Libvmmalloc - Transparently converts all the dynamic memory allocations into persistent memory allocations.

libpmemblk – Block access to pmem

libpmemlog - Log file on pmem (append-mostly)

libpmemobj - Transactional Object Store on pmem

Many more… pynvm , C++ bidings , etc..

Page 23: Persistent memory

Remote PMEM

Page 24: Persistent memory

Remote NVMe : using RDMA to transfer NVMe commands & data

http://blog.pmcs.com/flash-memory-summit-2015-special-nvm-express-rdma-awesome/

Page 25: Persistent memory

Transitioning from Indirect to Direct Flow

● Project Donard ( PMC - Microsemi)● Page Struct backed Pmem patch (I/O mem are normally accessed via PFN only)

Page 26: Persistent memory

Comes with Challenge : Durability vs Visibility

http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf

Page 27: Persistent memory

RDMA + DDIO

Page 28: Persistent memory

RDMA + Non Allocating write

Page 29: Persistent memory

Peer 2 Peer : Bypassing CPU + SW bottleneck

● NVM HW - Expose BAR address

● March 16 : RFC patchset for DAX allowing DMA to I/O mem

● CCIX fabric

● Use case: ○ Pre-process in Data

path○ Avoid RAM buffer

( HMM style ) ○ SW only fetch what is

necessary

Page 30: Persistent memory

Future Hyperscale Architecture

NVMe gravy train for 3-5 years

Transition to Pmem optimised apps and

Natural evolution of Ethernet Connected Drive => Fabric connected Pmem

Durable Array of Wimpy Nodes

Direct PMEM

Low power High perf K/V storage

Use pluggable front end

Rearranged based on needs

Page 31: Persistent memory

LinksDrivers specs: http://pmem.io/documents/

NVDIMM Namespace Specification: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf NVDIMM Drivers Writers Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf NVDIMM DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf Linux docs: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/nvdimm/nvdimm.txtQemu : https://github.com/xiaogr/qemu/tree/nvdimm-v9Seabios : http://www.seabios.org/pipermail/seabios/2015-September/009770.html Libraries:

https://github.com/pmem/nvml/ https://github.com/perone/pynvm http://opennvm.github.io/index.html https://github.com/spdk/spdk

Project :PMFS : https://github.com/linux-pmfs/pmfs NOVA: NOn-Volatile memory Accelerated log-structured file system https://github.com/NVSL/NOVAPCMSIM : https://code.google.com/p/pcmsim/

Patch : Donard: A PCIe Peer-2-Peer kernel patch https://github.com/sbates130272/donard adds struct page backing for IO memory and as such allows IO memory to be used as a DMA target :

http://www.spinics.net/lists/linux-mm/msg103990.html

Page 32: Persistent memory

Thank You!Questions ?

Page 33: Persistent memory

NVDIMM block I/O path