Integrating Concurrency Control and Energy Management in ...

105
Integrating Concurrency Control and Energy Management in Device Drivers Kevin Klues,Vlado Handziski, Chenyang Lu, Adam Wolisz, David Culler, David Gay, and Philip Levis

Transcript of Integrating Concurrency Control and Energy Management in ...

Integrating Concurrency Control and Energy Management in Device

Drivers

Kevin Klues, Vlado Handziski, Chenyang Lu, Adam Wolisz,

David Culler, David Gay, and Philip Levis

Overview• Concurrency Control:

• Concurrency of I/O operations alone, not of threads in general• Synchronous vs. Asynchronous I/O

• Energy Management:• Power state of devices needed to perform I/O operations• Determined by pending I/O requests using Asynchronous I/O

2

Overview• Concurrency Control:

• Concurrency of I/O operations alone, not of threads in general• Synchronous vs. Asynchronous I/O

• Energy Management:• Power state of devices needed to perform I/O operations• Determined by pending I/O requests using Asynchronous I/O

3

OSFlash Driver Physical Flash

read()

write()

setPowerState()

Application

read()write()read()read()

Overview

The more workload information an application can give the OS, the more energy it

can save when scheduling that workload

• Concurrency Control:• Concurrency of I/O operations alone, not of threads in general• Synchronous vs. Asynchronous I/O

• Energy Management:• Power state of devices needed to perform I/O operations• Determined by pending I/O requests using Asynchronous I/O

4

Outline• Background Information• Platform and Application• Driver architecture• Evaluation• Conclusion

5

Motivation• Difficult to manage energy in traditional OSs

• Hard to tell OS about future application workloads• All logic pushed out to the application• API extensions for hints?

6

Existing OS Approaches• Dynamic CPU Voltage Scaling

• Vertigo - Application workload classes• Grace OS - Explicit real-time deadlines

• Disk Spin Down• Coop-IO - Application specified timeouts

7

• Dynamic CPU Voltage Scaling• Vertigo - Application workload classes• Grace OS - Explicit real-time deadlines

• Disk Spin Down• Coop-IO - Application specified timeouts

Existing OS Approaches

8

• Dynamic CPU Voltage Scaling• Vertigo - Application workload classes• Grace OS - Explicit real-time deadlines

• Disk Spin Down• Coop-IO - Application specified timeouts

Existing OS Approaches

Saving energy is a complex process

8

• Dynamic CPU Voltage Scaling• Vertigo - Application workload classes• Grace OS - Explicit real-time deadlines

• Disk Spin Down• Coop-IO - Application specified timeouts

Existing OS Approaches

Saving energy is a complex process

8

A little application knowledge can help us alot

• Domain in need of unique solution to this problem• Harsh energy requirements• Very small source of power (2 AA batteries)‏• Must run unattended from months to years

• First generation sensornet OSes (TinyOS, Contiki, Mantis, ...)‏• Push all energy management to the application• Optimal energy savings at cost of application complexity

Sensor Networks

9

ICEM: Integrated Concurrency and Energy Management

• A device driver architecture that automatically manages energy• Implemented in TinyOS 2.0 -- all drivers follow it• Introduces Power Locks, split-phase locks with integrated energy and

configuration management• Defines three classes of drivers: dedicated, shared, virtualized• Provides a component library for building drivers

• Advantages of using ICEM• Energy efficient – At least 98.4% as hand-tuned implementation• Reduces code complexity – 400 vs. 68 lines of code• Enables natural decomposition of applications

10

Outline• Introduction and Motivation• Platform and Application• ICEM architecture• Evaluation• Conclusion

11

• Six major I/O devices• Possible Concurrency

• I2C, SPI, ADC• Energy Management

• Turn peripherals on only when needed • Turn off otherwise

The Tmote Platform

12

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Representative Logging Application

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Code Complexity

Every 5 minutes: Log prior readings sample humidity sample total solar sample photo active sample temperature

ICEM ApplicationEvery 5 minutes: Turn on SPI bus Turn on flash chip Turn on voltage reference Turn on I2C bus Log prior readings Start humidity sample Wait 5ms for log Turn off flash chip Turn off SPI bus Wait 12ms for vref Turn on ADC Start total solar sample Wait 2ms for total solar Start photo active sample Wait 2ms for photo active Turn off ADC Turn off voltage reference Wait 34ms for humidity Start temperature sample Wait 220ms for temperature Turn off I2C bus

Hand-Tuned Application

Outline• Introduction and Motivation• Platform and Application• ICEM architecture• Evaluation• Conclusion

23

Split-Phase I/O Operations• Split-phase I/O operations

• Implemented within a single thread of control• Application notified of I/O completion through direct upcall• Driver given workload information before returning control• Example: read() readDone()‏

24

Application

Driver

read() ‏ readDone() ‏

I/O request I/O interrupt

void readDone(uint16_t val) { next_val = val; read();}

ICEM Architecture• Defines three classes of drivers

• Virtualized – provide only functional interface• Dedicated – provide functional and power interface• Shared – provide functional and lock interface

25

Virtualized Device Drivers• Provide only a Functional interface

• Assume multiple users• Implicit concurrency control through buffering requests• Implicit energy management based on pending requests• Implemented for higher-level services that can tolerate longer latencies

26

Energy: ImplicitConcurrency: Implicit

Virtualized

Dedicated Device Drivers• Provide Functional and Power Control interfaces

• Assume a single user• No concurrency control

• Explicit energy management • Low-level hardware and bottom-level abstractions have a dedicated driver

27

Energy: ImplicitConcurrency: None

Dedicated

Shared Device Drivers• Provide Functional and Lock interfaces

• Assume multiple users• Explicit concurrency control through Lock request• Implicit energy management based on pending requests• Used by users with stringent timing requirements

28

Energy: ImplicitConcurrency: Explicit

Shared

ICEM Architecture• Defines three classes of drivers

• Virtualized – provide only functional interface• Dedicated – provide functional and power interface• Shared – provide functional and lock interface

• Power Locks split-phase locks with integrated energy and configuration management

29

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

123

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

Lock ---

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

---

Power Locks

DedicatedDriver

Power Locks

PowerControl

HW-SpecificConfiguration

Lock

Functional

---

ICEM Architecture• Defines three classes of drivers

• Virtualized – provide only functional interface• Dedicated – provide functional and power interface• Shared – provide functional and lock interface

• Power Locks: split-phase locks with integrated energy and configuration management

• Component library• Arbiters – manage I/O concurrency• Configurators – setup device specific configurations• Power Managers – provide automatic power management

43

Component Library

Lock

Power Locks

PowerControl

HW-SpecificConfiguration

Component Library

Lock

PowerControl

HW-SpecificConfiguration

Arbiter

ArbiterConfigure

DefaultOwner

Configurator Power Manager

Component Library

PowerControl

HW-SpecificConfiguration

Arbiter

ArbiterConfigure

DefaultOwner

Configurator Power Manager

Lock

n Lock interface for concurrency control (FCFS, Round-Robin) ‏n ArbiterConfigure interface automatic hardware configuration

n DefaultOwner interface for automatic power management

Component Library

PowerControl

HW-SpecificConfiguration

Arbiter

ArbiterConfigure

DefaultOwner

Configurator Power Manager

Lock

n Lock interface for concurrency control (FCFS, Round-Robin) ‏n ArbiterConfigure interface automatic hardware configuration

n DefaultOwner interface for automatic power management

---

Component Library

PowerControl

HW-SpecificConfiguration

Arbiter

ArbiterConfigure

DefaultOwner

Configurator Power Manager

Lock

n Implement ArbiterConfigure interface

n Call hardware specific configuration from dedicated driver

Component Library

PowerControl

HW-SpecificConfiguration

Arbiter

ArbiterConfigure

DefaultOwner

Configurator Power Manager

Lock

n Implement DefaultOwner interface

n Power down device when device falls idle

n Power up device when new lock request comes in

n Currently provide Immediate and Deferred policies

Shared Driver Example• Msp430 USART (Serial Controller)

• Three modes of operation – SPI, I2C, UART

50

Shared Driver Example

Msp430 USART

Power Control

Functional Configuration

Shared Driver Example

Arbiter

ImmediatePower Manager

SPI User

Msp430 USART

SPIConfigurator

Power Control

Functional Configuration

Lock

UartConfigurator

Shared Driver Example

Arbiter

ImmediatePower Manager

SPI User

Msp430 USART

SPIConfigurator

Uart User

I2C UserI2C

Configurator

Power Control

Functional Configuration

Lock

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

---

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Functional

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Log User

Flash Driver

Log Virtualizer

Lock

Power Control

Block Virtualizer

Block User

Virtualized Driver Example

Arbiter

ImmediatePower ManagerSPI User

Flash Driver Lock

Power Control

Log User

Log VirtualizerBlock Virtualizer

Block User

Outline• Introduction and Motivation• Platform and Application• ICEM architecture• Evaluation• Conclusion

62

Applications

• Hand Tuned – Most energy efficient• ICEM – All concurrent operations• Serial + – Optimal serial ordering • Serial - – Worst case serial ordering

Every 12 hours: For all new entries: Send current sample Read next sample

Flash

Consumer

Sensors Radio

Every 5 minutes: Write prior samples Sample photo active Sample total solar Sample temperature Sample humidity

Producer

Tmote Energy Consumption

Average energy consumption for application operations

Tmote Energy Consumption

Average energy consumption for application operations

Application Energy Consumption

Application energy with 5 minute sampling interval and one send batch every 12 hours

0

24.000

48.000

72.000

96.000

120.000

288 Samples 2 Sends

E (m

As) Hand Tuned

ICEMSerial +Serial -

Application Energy Consumption

Application energy with 5 minute sampling interval and one send batch every 12 hours

0

24.000

48.000

72.000

96.000

120.000

288 Samples 2 Sends

E (m

As) Hand Tuned

ICEMSerial +Serial -

Application Energy Consumption

Application energy with 5 minute sampling interval and one send batch every 12 hours

0

24.000

48.000

72.000

96.000

120.000

288 Samples 2 Sends

E (m

As) Hand Tuned

ICEMSerial +Serial -

Sampling Power Trace

Overhead of ICEM to Hand-Tuned Implementation = ADC Timeout + Power Lock OverheadsWith 288 samples per day ≈ 2.9 mAs/day ≈ 1049 mAs/year

Insignificant compared to total 5.60% of total sampling energy 0.03% of total application energy

Expected Node Lifetimes

Expected Node Lifetimes

Expected Node Lifetimes

Evaluation Conclusions• Conclusions about the OS

• Small RAM/ROM overhead• Small computational overhead• Efficiently manages energy when given enough information

• Conclusions for the developer• Build drivers short power down timeouts• Submit I/O requests in parallel

73

Conclusion• ICEM: Integrated Concurrency and Energy Management

• Device driver architecture for low power devices• At least 98.4% as energy efficient as hand-tuned implementation of

representative application• Simplifies application and driver development• Questions the assumption that applications must be responsible for all energy management and cannot have a standardized OS with a simple API

74

83

Questions?

84

Questions?

• SourceForge TinyOS CVS repository:w http://sourceforge.net/cvs/?group_id=28656

• Library components and interfacesw tinyos-2.x/tos/interfacesw tinyos-2.x/tos/lib/powerw tinyos-2.x/tos/system

• Example Driversw Atmega128 ADC: tos/chips/atm128/adc w MTS300 Photo: tos/sensorboards/mts300 w MSP430 USART0: tos/chips/msp430/usart w Storage: tos/chips/stm25p w CC2420: tos/chips/cc2420

85

Hardware

86

Future Work

• Compile-time deadlock detection• Conditional I/O Operations

if(Temp.read() > 30) Humidity.read()‏• Improved OS scheduling

87

Disclaimers

• Omission of MCU power management discussion • Run time checks on arbiter operations• Implementing ICEM in threaded OS

88

Microbenchmarks: Overhead

• Per request MCU cycle overhead (locking, unlocking)‏

Worst Case = 371 cycles ≈ 93 µs on Tmote ≈ 0.168 mAms

Very small

89

Tmote Current Consumption

Average current consumption for application operations

90

Traditional Concurrency Control

Producer

Thread

Physical Device

Time

91

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request

Time

92

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request

Time

Scheduler

BlocksProducer

93

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Scheduler

BlocksProducer

94

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksProducer

95

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksProducer

Single Thread:

Humidity Sample

TemperatureSample

Total SolarSample

Photo ActiveSample

FlashWrite

No potential for concurrency

96

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksProducer

Single Thread:

Humidity Sample

TemperatureSample

Total SolarSample

Photo ActiveSample

FlashWrite

No potential for concurrencyOrdering Important

97

Traditional Concurrency Control

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksThread

Multi-thread:

Flash Write

TemperatureSample

Humidity Sample

Photo ActiveSample

Total SolarSample

Producer

Thread

Producer

Thread

Producer

ThreadFlashWrite

Driver(Top Part) Driver

(Top Part) Driver

(Top Part) Driver

(Top Part)

Physical DevicePhysical DevicePhysical DevicePhysical Device

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread Producer

on ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

98

Traditional Concurrency Control

Flash Write

TemperatureSample

Humidity Sample

Photo ActiveSample

Total SolarSample

Concurrency can now be exploited

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksThread

Multi-thread:

Producer

Thread

Producer

Thread

Producer

ThreadFlashWrite

Driver(Top Part) Driver

(Top Part) Driver

(Top Part) Driver

(Top Part)

Physical DevicePhysical DevicePhysical DevicePhysical Device

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread Producer

on ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

99

Traditional Concurrency Control

Flash Write

TemperatureSample

Humidity Sample

Photo ActiveSample

Total SolarSample

Concurrency can now be exploitedPotential innefficiencies

Producer

ThreadDriver

(Top Part)

I/O operation

Physical Device

I/O Request Interrupt

Time

Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksThread

Multi-thread:

Producer

Thread

Producer

Thread

Producer

ThreadFlashWrite

Driver(Top Part) Driver

(Top Part) Driver

(Top Part) Driver

(Top Part)

Physical DevicePhysical DevicePhysical DevicePhysical Device

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Driver(Bottom Part)

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread

Scheduler

BlocksThread Producer

on ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

Produceron

ReadyQueue

100

Traditional Concurrency Control

Producer

ThreadFlash Driver(Top Part)

write()‏

Flash Chip

I/O Request Interrupt

Time

Flash Driver(Bottom Part)

Produceron

ReadyQueue

Scheduler

BlocksProducer

Consumer

ThreadFlash Driver(Top Part)

read()‏

Flash Chip

I/O Request Interrupt

Flash Driver(Bottom Part)

Consumeron

ReadyQueue

Scheduler

BlocksConsum

er

Time

101

Sampling Power Trace

Overhead of ICEM over Hand-Tuned Implementation = ADC Timeout + Arbiter Overheads = (536uA * 17ms) + (1920uA * 0.45ms)‏ = 9976 uAms/sample ≈ 0.01mAs / sample

102

Virtualized Driver Example• Flash Storage

w Two storage abstractions – Log, Block

Arbiter

ImmediatePower ManagerSPI User

Log UserBlock User

Sector

Block Virtualizer Log Virtualizer

103

Traditional Concurrency Control

ThreadA

Device Driver(Top Part)

SchedulerBlocks A

read()‏

ThreadB

Device

I/O Request

Time

104

Traditional Concurrency Control

ThreadA

Device Driver(Top Part)

SchedulerBlocks A

read()‏

ThreadB

SchedulerRuns B

Device

I/O Request

Time

105

Traditional Concurrency Control

ThreadA

Interrupt

Device Driver(Top Part)

SchedulerBlocks A

read()‏

ThreadB

SchedulerRuns B

Device

I/O Request

Device Driver(Bottom Part)

A Ready

Time

106

Traditional Concurrency Control

ThreadA

Interrupt

Device Driver(Top Part)

SchedulerBlocks A

read()‏

ThreadB

SchedulerRuns B

Device

I/O Request

Device Driver(Bottom Part)

A Ready

B Finishes

Time

107

Traditional Concurrency Control

ThreadA

Interrupt

Device Driver(Top Part)

SchedulerBlocks A

read()‏

ThreadB

SchedulerRuns B

Device

I/O Request

Device Driver(Bottom Part)

A Ready

B Finishes

Scheduler Runs A

Time

108

Traditional Concurrency Control

ThreadA

Interrupt

Device Driver(Top Part)

SchedulerBlocks A

Time

read()‏

ThreadB

SchedulerRuns B

Device

I/O Request

Device Driver(Bottom Part)

A Ready

B Finishes

Scheduler Runs A

Problem

109

• Sensornet applications have the capability of exploiting high levels of concurrencyw Must do so without threads (i.e no concept

of an execution entity) ‏w Require extreme low-power operation

• Traditional systems not well suited w Blocking application level I/O callsw Thread scheduling in the device driver

Problem

110

5 min

10 sec

1 sec

50 sec

111

5 min

10 sec

1 sec

50 sec