26th International Conference on VLSI
January 2013
Pune,India
Cashier: A Cache Energy Saving Technique for QoS Systems
Sparsh Mittal, Zhao Zhang and Yanan Cao
ECpE, Iowa State University, USA.
This work is supported in part by the National Science Foundation under grants CNS-0834476 and CNS-1117604.
Purpose Of Our Work
• Saving cache energy in QoS systems
• An approach for green computing
• Highlight:
– Software based approach with light hardware support.
–Uses dynamic cache reconfiguration
–Offers large energy savings
4/4/2014 © VLSI Design Conference 2013 2
Presentation Plan
• Motivation for the Research
• Limitations of Existing Approaches
• Cashier Approach: Main Idea
• Energy Saving Algorithm & Flow-diagram
• RCE Design
• Experiments and Results
• Conclusion
4/4/2014 © VLSI Design Conference 2013 3
Motivation: Increasing Cache Sizes
4/4/2014 © VLSI Design Conference 2013 4
• Power issue drives major design decisions.
• Last level cache size increasing (e.g. 32MB in Intel’s Poulson processor)!
• Caches consume huge chip area (>40%).
Alpha 21364 microprocessor
die photo [1]
1. http://www.oracle.com/technology/products/rdb/pdf/2002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf
Motivation: Leakage Energy Increase
4/4/2014 © VLSI Design Conference 2013 5
• Leakage current has been increasing dramatically across CMOS technology generations
• Leakage is a major source of power consumption in last level caches (LLC)
We need novel approaches for saving cache leakage energy!
Limitations of Existing Approaches
4/4/2014 © VLSI Design Conference 2013 6
• Require offline analysis, difficult to scale [2].
• Provide coarse-grain allocation granularity [2].
• Cannot take components other than cache into account: May reduce cache energy, but increase system energy [3].
• Cannot directly optimize energy [2,3].
2. S.H. Yang et al., “An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches,” in HPCA, 2001 3. Kaxiras et al. “Cache decay: exploiting generational behavior to reduce cache leakage power”, ISCA 2001
Example of Existing Approaches
4/4/2014 © VLSI Design Conference 2013 7
• Hybrid (selective-sets and selective-ways) approach to turn-off cache [4]
– Selective-sets: X, 4X/8, 2X/8 and 1X/8
– Selective-ways: 1, 2, 3, 4, 5, 6, 7, 8 (assuming 8-way)
4. S. Mittal et al., “EnCache: Improving cache energy efficiency using a software-controlled profiling cache,” in IEEE EIT, 2012
Example of Existing Approaches (ref [3])
Example 1: HALF sets, 8 ways ON
Way 1 Way 2 Way 8 Way 7 ……….
OFF
ON
Cache sets
Example of Existing Approaches
Example 2: FULL sets, 4 ways ON
Way 1 Way 2 Way 8 Way 7 ……….
OFF
ON
Cache sets
Example of Existing Approaches
Example 3: Quarter sets, 4 ways ON
Way 1 Way 2 Way 8 Way 7 ……….
OFF
ON
Cache sets
Example of Existing Approaches
Example 4: Eighth sets, 2 ways ON
Way 1 Way 2 Way 8 Way 7 ……….
OFF
ON
Cache sets
Example of Existing Approaches
2KB
2KB
2KB
2KB
8 KB, 4-way base cache
2KB
2KB
2KB
2KB
8 KB, 2-way
2KB
2KB
2KB
2KB
8 KB, direct-mapped
Way concatenation
Configurable Line size
16 byte physical line size
5. C. Zhang et al., “A highly configurable cache architecture for embedded systems,” ISCA, 2003
4/4/2014 © VLSI Design Conference 2013 13
Our Approach
Cashier: A Cache Energy Saving
Technique for Quality of Service Systems
4/4/2014 © VLSI Design Conference 2013 14
Motivation for Saving Energy in QoS Systems
• Several real-world applications present soft real-time resource demands.
• If a task is completed by its deadline, the actual completion time does not matter from user’s perspective.
• Saving cache energy while meeting deadlines is even more challenging.
4/4/2014 © VLSI Design Conference 2013 15
Main Idea
• There exists inter- and intra-program variation in cache requirement of different programs.
• By allocating only suitable amount of cache to each program, the rest of cache can be turned off.
• Leakage saving with minimum performance loss.
• Optimize system energy, not just cache energy.
• Use set-sampling to keep overhead low.
4/4/2014 © VLSI Design Conference 2013 16
Cashier: Problem Definition
1. Slack specified as absolute time (Magnitude Slack Method or MSM)
2. Slack specified as percentage of baseline (Percentage Slack Method or PSM)
Baseline (T) Slack = 100ms
Baseline (T) Slack = 5% of T
4/4/2014 © VLSI Design Conference 2013 17
Cashier: Energy Saving Algorithm • MSM and PSM work by utilizing a fraction of total slack in
each interval, such that – Deadline is not missed.
– Rest of the cache is turned off to save energy.
• Cache is allocated using cache coloring. Uniqueness: – No change to virtual-to-physical mapping.
– Smaller reconfiguration overhead.
– Uses mapping table for flexible remapping.
4/4/2014 © VLSI Design Conference 2013 18
Cashier: Flow-diagram
Mapping Table
Set # Inside Color<6>
Region ID
<6>
Cache color
<6>
L2 Tag <40> Offset
<6>
Offset
Set
Tag RCE
Counters
Algorithm
Color 63
Color 0
L2 Cache Storage Remap
Color 1
Set # Inside Color<6>
……
L2 Access Physical Address
OS-controlled
4/4/2014 © VLSI Design Conference 2013 19
Illustration of Cache Coloring
Full
Half
Quarter
Eighth
1/128
Selective-sets approach Cache coloring approach
4/4/2014 © VLSI Design Conference 2013 20
Cashier: RCE design for profiling
L2 Access
Address
Queue
Core-storage
M1
M2
M3
M4
M5
M6
MUX
16X/16
12X/16
8X/16
4X/16
2X/16
1X/16
Address Decoders M1
M2
M3
M4
M5
M6
Storage
RS
Sampling
Filter
Important !
4/4/2014 © VLSI Design Conference 2013 21
Key Component : RCE
• Tag-only (no data)
• Uses set-sampling (sampling ratio = 32)
• Non-intrusive and parallel operation
• Not on critical path
• Small latency
• Energy overhead: < 0.5% of L2 cache energy
4/4/2014 © VLSI Design Conference 2013 22
Experiments
• Sniper simulator, SPEC2006 benchmarks
• 1B instructions
• 2MB L2
• Baseline: unmanaged L2 cache
4/4/2014 © VLSI Design Conference 2013 23
Cashier: Results with MSM Algorithm • Energy saving of 25.9%
• For two benchmarks, the deadline is missed.
4/4/2014 © VLSI Design Conference 2013 24
Cashier: Results with PSM Algorithm • Energy saving of 23.6%
• No benchmark misses the deadline.
4/4/2014 © VLSI Design Conference 2013 25
Conclusion
Cache sizes and leakage energy dissipation are increasing.
We propose system level techniques using dynamic cache reconfiguration.
We propose techniques for QoS, desktop, server systems.
Our techniques offer large energy savings and compare well to existing techniques.
4/4/2014 © VLSI Design Conference 2013 26
Questions and comments are welcome!
Sparsh Mittal [email protected]
1. http://www.oracle.com/technology/products/rdb/pdf/2002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf
2. S.H. Yang et al., “An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches,” in HPCA, 2001
3. Kaxiras et al. “Cache decay: exploiting generational behavior to reduce cache leakage power”, ISCA 2001
4. S. Mittal et al., “EnCache: Improving cache energy efficiency using a software-controlled profiling cache,” in IEEE EIT, 2012
5. C. Zhang et al., “A highly configurable cache architecture for embedded systems,” ISCA, 2003.
6. S. Mittal, "A survey of architectural techniques for DRAM power management." International Journal of High Performance Systems Architecture 2012.
7. S. Mittal, "A survey of architectural techniques for improving cache power efficiency." Sustainable Computing: Informatics and Systems (2013).
8. S. Mittal et al., "Palette: A cache leakage energy saving technique for green computing." HPC: Transition Towards Exascale Processing (2013).
9. S. Mittal, “Dynamic cache reconfiguration based techniques for improving cache energy efficiency”, PhD thesis, Iowa State University (2013).
References