CASHIER: A Cache Energy Saving Technique for QoS Systems

27
26 th International Conference on VLSI January 2013 Pune,India Cashier: A Cache Energy Saving Technique for QoS Systems Sparsh Mittal, Zhao Zhang and Yanan Cao ECpE, Iowa State University, USA. This work is supported in part by the National Science Foundation under grants CNS-0834476 and CNS-1117604.

Transcript of CASHIER: A Cache Energy Saving Technique for QoS Systems

26th International Conference on VLSI

January 2013

Pune,India

Cashier: A Cache Energy Saving Technique for QoS Systems

Sparsh Mittal, Zhao Zhang and Yanan Cao

ECpE, Iowa State University, USA.

This work is supported in part by the National Science Foundation under grants CNS-0834476 and CNS-1117604.

Purpose Of Our Work

• Saving cache energy in QoS systems

• An approach for green computing

• Highlight:

– Software based approach with light hardware support.

–Uses dynamic cache reconfiguration

–Offers large energy savings

4/4/2014 © VLSI Design Conference 2013 2

Presentation Plan

• Motivation for the Research

• Limitations of Existing Approaches

• Cashier Approach: Main Idea

• Energy Saving Algorithm & Flow-diagram

• RCE Design

• Experiments and Results

• Conclusion

4/4/2014 © VLSI Design Conference 2013 3

Motivation: Increasing Cache Sizes

4/4/2014 © VLSI Design Conference 2013 4

• Power issue drives major design decisions.

• Last level cache size increasing (e.g. 32MB in Intel’s Poulson processor)!

• Caches consume huge chip area (>40%).

Alpha 21364 microprocessor

die photo [1]

1. http://www.oracle.com/technology/products/rdb/pdf/2002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf

Motivation: Leakage Energy Increase

4/4/2014 © VLSI Design Conference 2013 5

• Leakage current has been increasing dramatically across CMOS technology generations

• Leakage is a major source of power consumption in last level caches (LLC)

We need novel approaches for saving cache leakage energy!

Limitations of Existing Approaches

4/4/2014 © VLSI Design Conference 2013 6

• Require offline analysis, difficult to scale [2].

• Provide coarse-grain allocation granularity [2].

• Cannot take components other than cache into account: May reduce cache energy, but increase system energy [3].

• Cannot directly optimize energy [2,3].

2. S.H. Yang et al., “An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches,” in HPCA, 2001 3. Kaxiras et al. “Cache decay: exploiting generational behavior to reduce cache leakage power”, ISCA 2001

Example of Existing Approaches

4/4/2014 © VLSI Design Conference 2013 7

• Hybrid (selective-sets and selective-ways) approach to turn-off cache [4]

– Selective-sets: X, 4X/8, 2X/8 and 1X/8

– Selective-ways: 1, 2, 3, 4, 5, 6, 7, 8 (assuming 8-way)

4. S. Mittal et al., “EnCache: Improving cache energy efficiency using a software-controlled profiling cache,” in IEEE EIT, 2012

Example of Existing Approaches (ref [3])

Example 1: HALF sets, 8 ways ON

Way 1 Way 2 Way 8 Way 7 ……….

OFF

ON

Cache sets

Example of Existing Approaches

Example 2: FULL sets, 4 ways ON

Way 1 Way 2 Way 8 Way 7 ……….

OFF

ON

Cache sets

Example of Existing Approaches

Example 3: Quarter sets, 4 ways ON

Way 1 Way 2 Way 8 Way 7 ……….

OFF

ON

Cache sets

Example of Existing Approaches

Example 4: Eighth sets, 2 ways ON

Way 1 Way 2 Way 8 Way 7 ……….

OFF

ON

Cache sets

Example of Existing Approaches

2KB

2KB

2KB

2KB

8 KB, 4-way base cache

2KB

2KB

2KB

2KB

8 KB, 2-way

2KB

2KB

2KB

2KB

8 KB, direct-mapped

Way concatenation

Configurable Line size

16 byte physical line size

5. C. Zhang et al., “A highly configurable cache architecture for embedded systems,” ISCA, 2003

4/4/2014 © VLSI Design Conference 2013 13

Our Approach

Cashier: A Cache Energy Saving

Technique for Quality of Service Systems

4/4/2014 © VLSI Design Conference 2013 14

Motivation for Saving Energy in QoS Systems

• Several real-world applications present soft real-time resource demands.

• If a task is completed by its deadline, the actual completion time does not matter from user’s perspective.

• Saving cache energy while meeting deadlines is even more challenging.

4/4/2014 © VLSI Design Conference 2013 15

Main Idea

• There exists inter- and intra-program variation in cache requirement of different programs.

• By allocating only suitable amount of cache to each program, the rest of cache can be turned off.

• Leakage saving with minimum performance loss.

• Optimize system energy, not just cache energy.

• Use set-sampling to keep overhead low.

4/4/2014 © VLSI Design Conference 2013 16

Cashier: Problem Definition

1. Slack specified as absolute time (Magnitude Slack Method or MSM)

2. Slack specified as percentage of baseline (Percentage Slack Method or PSM)

Baseline (T) Slack = 100ms

Baseline (T) Slack = 5% of T

4/4/2014 © VLSI Design Conference 2013 17

Cashier: Energy Saving Algorithm • MSM and PSM work by utilizing a fraction of total slack in

each interval, such that – Deadline is not missed.

– Rest of the cache is turned off to save energy.

• Cache is allocated using cache coloring. Uniqueness: – No change to virtual-to-physical mapping.

– Smaller reconfiguration overhead.

– Uses mapping table for flexible remapping.

4/4/2014 © VLSI Design Conference 2013 18

Cashier: Flow-diagram

Mapping Table

Set # Inside Color<6>

Region ID

<6>

Cache color

<6>

L2 Tag <40> Offset

<6>

Offset

Set

Tag RCE

Counters

Algorithm

Color 63

Color 0

L2 Cache Storage Remap

Color 1

Set # Inside Color<6>

……

L2 Access Physical Address

OS-controlled

4/4/2014 © VLSI Design Conference 2013 19

Illustration of Cache Coloring

Full

Half

Quarter

Eighth

1/128

Selective-sets approach Cache coloring approach

4/4/2014 © VLSI Design Conference 2013 20

Cashier: RCE design for profiling

L2 Access

Address

Queue

Core-storage

M1

M2

M3

M4

M5

M6

MUX

16X/16

12X/16

8X/16

4X/16

2X/16

1X/16

Address Decoders M1

M2

M3

M4

M5

M6

Storage

RS

Sampling

Filter

Important !

4/4/2014 © VLSI Design Conference 2013 21

Key Component : RCE

• Tag-only (no data)

• Uses set-sampling (sampling ratio = 32)

• Non-intrusive and parallel operation

• Not on critical path

• Small latency

• Energy overhead: < 0.5% of L2 cache energy

4/4/2014 © VLSI Design Conference 2013 22

Experiments

• Sniper simulator, SPEC2006 benchmarks

• 1B instructions

• 2MB L2

• Baseline: unmanaged L2 cache

4/4/2014 © VLSI Design Conference 2013 23

Cashier: Results with MSM Algorithm • Energy saving of 25.9%

• For two benchmarks, the deadline is missed.

4/4/2014 © VLSI Design Conference 2013 24

Cashier: Results with PSM Algorithm • Energy saving of 23.6%

• No benchmark misses the deadline.

4/4/2014 © VLSI Design Conference 2013 25

Conclusion

Cache sizes and leakage energy dissipation are increasing.

We propose system level techniques using dynamic cache reconfiguration.

We propose techniques for QoS, desktop, server systems.

Our techniques offer large energy savings and compare well to existing techniques.

4/4/2014 © VLSI Design Conference 2013 26

Questions and comments are welcome!

Sparsh Mittal [email protected]

1. http://www.oracle.com/technology/products/rdb/pdf/2002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf

2. S.H. Yang et al., “An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches,” in HPCA, 2001

3. Kaxiras et al. “Cache decay: exploiting generational behavior to reduce cache leakage power”, ISCA 2001

4. S. Mittal et al., “EnCache: Improving cache energy efficiency using a software-controlled profiling cache,” in IEEE EIT, 2012

5. C. Zhang et al., “A highly configurable cache architecture for embedded systems,” ISCA, 2003.

6. S. Mittal, "A survey of architectural techniques for DRAM power management." International Journal of High Performance Systems Architecture 2012.

7. S. Mittal, "A survey of architectural techniques for improving cache power efficiency." Sustainable Computing: Informatics and Systems (2013).

8. S. Mittal et al., "Palette: A cache leakage energy saving technique for green computing." HPC: Transition Towards Exascale Processing (2013).

9. S. Mittal, “Dynamic cache reconfiguration based techniques for improving cache energy efficiency”, PhD thesis, Iowa State University (2013).

References