Post on 13-Mar-2023
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Cop
yrig
ht ©
2005
–200
8 M
orga
n D
eter
s and
Ron
Cyt
ron
Demystifying GCCUnder the Hood of the
GNU Compiler Collection
Copyright is held by the author/owner(s).PLDI’08, June 9, 2008, Tuscon, Arizona, USA
Copyright is held by the author/owner(s).PLDIPLDI’’08, June 9, 2008, Tuscon, Arizona, USA08, June 9, 2008, Tuscon, Arizona, USA
Distributed Object Computing LaboratoryWashington University
St. Louis, Missouri
Programming Logic GroupTechnical University of Catalonia
Barcelona, Spain
Morgan Detersmdeters@lsi.upc.edu
Ron Cytroncytron@cs.wustl.edu
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Tutorial Objectives 2008年8月12日星期二
2
Tutorial Objectives• Introduce the internals of GCC 4.3.0 (March 2008)
Java and C++ front-endsOptimizationsBack-end structure
• How to modify, or write your ownFront end
• New languages, new featuresMiddle end
• Analysis, optimizationBack end
• Machine-specific targets
• How to debug/improve GCC
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
Introduction
2008年8月12日星期二
3
What is GCC?
Why use GCC?
What does compilation with GCC look like?
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
4
What is GCC ?
• A compiler for multiple languages…CC++JavaObjective-C/C++FORTRANAda
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
5
What is GCC ?• …supporting multiple targets
arc arm avr bfinc4x cris crx fr30frv h8300 i386 ia64iq2000 m32c m32r m68hc11m68k mcore mips mmixmn10300 mt pa pdp11rs6000 s390 sh sparcstormy16 v850 vax xtensa
These are code generators; variants are also supported(e.g. powerpc is a “variant” of the rs6000 code generator)These are code generators; variants are also supported
(e.g. powerpc is a “variant” of the rs6000 code generator)
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
6
What GCC is not
• GCC is notan assembler (see GNU binutils)a C library (see glibc)a debugger (see gdb)an IDE
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
7
Why use GCC as an R&D platform?
• Research is immediately usable by everyoneLarge development community and user baseGCC is a modern, practical compiler
• multiple architectures, full standard languages, optimizations• debugging support
• You can meet GCC halfwaymodular: hack some parts, rely on the others
• Can incorporate bug fixes that come alongminor version upgrades (e.g. 3.3.x 3.4.x) – no big dealmajor version upgrades (e.g. 3.x 4.x) – more of a pain
• Need not maintain code indefinitely (if incorporated)
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
8
The GCC project and the GPL
• Open-sourceGNU General Public License (GPL)
• Changes made to GCC source code or associated libraries must also be GPLed
• However, compiler and libraries can be used/linked against in non-GPL development
Your improvements to GCC must be open-source, but your customers need not open-source their
programs to use your stuff
Your improvements to GCC must be open-source, but your customers need not open-source their
programs to use your stuff
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
9
Typical structure of GCC compilation
gcc/g++/gcj
compilercompiler assemblerassembler linkerlinker
ELF objectELF object
assemblyprogram
assemblyprogram
sourceprogramsource
program
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二
10
Inside the compiler
compiler (C, C++, Java)
parser /semanticchecker
parser /semanticchecker
treeoptimizations
treeoptimizations
gimplifiergimplifier expanderexpander
RTL passesRTL passes
target archinstruction selection
target archinstruction selection
trees RTL
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
GCC Basics
2008年8月12日星期二
11
How do you build GCC?
How do you navigate the source tree?
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
12
GCC Basics: Getting Started• Requirements to build GCC
usual suite of UNIX tools (C compiler, assembler/linker, GNU Make, tar, awk, POSIX shell)
• For developmentGNU m4 and GNU autotools (autoconf/automake/libtool)gperfbison, flexautogen, guile, gettext, perl, Texinfo, diffutils, patch, …
• Obtaining GCC sourcesgcc.gnu.org or local mirror (see gcc.gnu.org/mirrors.html)get gcc-core package, then language add-ons
• gcc-java requires gcc-g++
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
13
Source Metrics
• 492 Mbytes of material downloaded to build GCC• 2.7 Gbytes after build• As of 4.3.0
Need mpfr – Multiple precision floating point arithmeticNeed gmp – Multiple precision integer arithmeticNeed Eclipse – for Java front-end
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
14
Building GCC from sources• Configure it in a separate build directory from sources
/path/to/source/directory/configure options…--prefix=install-location--enable-languages=comma-separated-language-list
• To see the list of available languages:grep language= */config-lang.in
--enable-checking• turns on sanity checks (especially on intermediate representation)
• Build it !Environment variables useful when debugging compiler/runtime
• CFLAGS stage 1 flags (using host C compiler)• BOOT_CFLAGS stage 2 and stage 3 flags (using stage 1 GCC)• CFLAGS_FOR_TARGET flags for new GCC building target binaries• CXXFLAGS_FOR_TARGET
flags for new GCC building libstdc++/others• GCJFLAGS flags for new GCC building Java runtime• ‘-O0 –ggdb3’ is recommended when debugging
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
15
Building GCC from sources• Build it ! continued…
make bootstrap (to bootstrap) or make (to not)• bootstrap useful when compiling with non-GCC host compiler• during development, non-bootstrap is faster and also better at
recompiling just those sources that have changeduse make’s -j option to speed things up on MP/dual coremake bootstrap-lean
• cleans up between stages, uses less diskmake profiledbootstrap
• faster compiler produced, but need GCC host• –j unsupported
• Install it !make install
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
16
Building a cross-compiler• Code generator can be built for any target
runtime libraries then are built using that code generator
• Since GCC outputs assembly, you actually need a full cross development toolchain
Dan Kegel’s crosstool automates a GNU/Linux cross chain for popular configurations:
• Linux kernel headers• GNU binutils• glibc• gcc• see kegel.com/crosstool
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
17
GCC Basics: Getting Around
• Other tools recommended when hacking GCC
GNU Screen attach/reattach terminal sessionsetags navigation to source definitions (emacs)ctags navigation to source definitions (vi)c++filt demangle C++/Java mangled symbolsreadelf decompose ELF filesobjdump object file dumper/disassemblergdb GNU debugger
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
18
GCC Drivers
• gcc, g++, gcj are drivers, not compilersThey will execute (as appropriate):
• compiler (cc1, cc1plus, jc1)• Java program main entry point generation (jvgenmain)• assembler (as)• linker (collect2)
• Differences between drivers include active #defines, default libraries, other behavior
but can use any driver for any source language
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
19
Most useful driver options for debugging
-E preprocess, don’t compile-S compile, don’t assemble-H verbose header inclusion-save-temps save temporary files-print-search-dirs print search paths-v verbose (see what the driver does)-g include debugging symbols
--help get command line help--version show full version info-dumpversion show minimal version info
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
20
For extra help
man gcc basic option assistance
info gcc using gcc in-depth;language extensions etc.
info gccint internals documentation
Top-level INSTALL directory in distribution provideshelp on configuring and building GCC
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
21
Tour of GCC sourceINSTALL configuration/installation documentationboehm-gc the Boehm garbage collectorconfig architecture-specific configure fragmentscontrib contributed scriptsgjar a replacement for the jar toolfixincludes source for a program to fix host header
files when they aren't ANSI-compliantgcc the main compiler sourceinclude headers used by GCC (libiberty mostly)intl support for languages other than English
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二
22
Tour of GCC source, cont’dlibcpp source for C preprocessing librarylibffi Foreign Function Interface library (allows
function callers and receivers to havedifferent calling conventions)
libiberty useful utility routines (symbol tables etc.)used by GCC and replacement functionsfor common things not provided by host
libjava source for standard Java librarylibmudflap source for a pointer instrumentation librarylibstdc++-v3 source for standard C++ librarymaintainer-scripts utility scripts for GCC maintainerszlib compression library source
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
The GCC Front-End
2008年8月12日星期二
23
Middle-endMiddle-end Back-endBack-endFront-endFront-end
Option processing
Controlling drivers and hooking up front-ends
The C, C++, and Java front-ends
The GENERIC high-level intermediate representation
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
24
The GCC Front-End
• gcc, g++, gcj driver entry pointmain (gcc/gcc.c)
• cc1, cc1plus, jc1 share a common entry pointtoplev_main (gcc/toplev.c)
• actual main in gcc/main.c– just calls toplev_main()– can be overridden by front-end
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
25
Command-line option processing
• In gcc/ directorycommon.opt option definitionsopts.{c,h} common_handle_option()c-opts.c c_common_handle_option()c.opt C compiler option definitionsjava/lang.opt Java compiler option definitionsjava/lang.c java_handle_option()
• These are cc1, cc1plus, jc1 option handling routinesdrivers just pass on arguments as declared in spec files
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
26
common.opt
• Parsed by awk scripts at build time to generate options.c, options.h
• Simple formatLanguage specifications and option stanzas
• Each option stanza contains1. option name2. space-separated options list3. documentation string for --help output
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
27
Properties of command-line options• Available properties for use in .opt option spec files are
Common option is available for all front-endsTarget option is target-specificJoined argument is mandatory and may be joinedSeparate argument is mandatory and may be separateJoinedOrMissing optional argument, must be joined if presentRejectNegative there is not an associated “no-” optionUInteger argument expected is a nonnegative integerUndocumented undocumented; do not include in --help outputReport --fverbose-asm should report the state of this option
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
28
Properties of options cont’dVar(var-name) set var-name to true (or argument) if presentVarExists do not define variable in resulting options.cInit(value) static initializer for variableMask(name) associated with a bit in target_flags bit vector;
MASK_name is automatically #defined to thebitmask; TARGET_name is automatically #definedas an expression that is 1 when the option is used,0 when not
InverseMask(other, [this])option is inverse of another option withMask(other); if this is given, #defineTARGET_this.
MaskExists don’t #define again; use for synonymous optionsCondition(cond) option permitted iff preprocessor cond is true
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
29
Language-specific options
• gcc/c.opt, gcc/java/lang.opt, gcc/cp/lang.opt• Special processing in gcc/java/lang.c• Specify valid language-names as an option
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
30The GCC Front-End
Adding Command-Line Options
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
31
Controlling the drivers: spec filesgcc/gcc.c specs for gcc drivergcc/cp/lang-specs.h additional specs for g++ drivergcc/java/lang-specs.h additional specs for gcj driver
gcc/gcc.c contains documentation on spec languageUse -dumpspecs to see specifications
%{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)}%{!E:%{!M:%{!MM:%{traditional|ftraditional:
%eGNU C no longer supports -traditional without -E}%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)%(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \ncc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i}%(cc1_options)}
%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:cc1 %(cpp_unique_options) %(cc1_options)}}}
%{!fsyntax-only:%(invoke_as)}}}
%{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)}%{!E:%{!M:%{!MM:%{traditional|ftraditional:
%eGNU C no longer supports -traditional without -E}%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)%(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \ncc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i}%(cc1_options)}
%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:cc1 %(cpp_unique_options) %(cc1_options)}}}
%{!fsyntax-only:%(invoke_as)}}} adap
ted
from
gcc
/gcc
.cMorgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
32
The C front-end• C front-end is in gcc/ directory
parse entry point c_common_parse_file (c-opts.c)• workhorse is c_parse_file (c-parser.c)
c-common.def IR codes for C compilerc-common.c functions for C-like front-endsc-convert.c type conversionc-cppbuiltin.c built-in preprocessor #definesc-decl.c declaration handlingc-dump.c IR-dumpingc-errors.c pedantic warning issuancec-format.c format checking for printf-like functionsc-gimplify.c lowering of IR (and documentation)
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
33
The C front-end, cont’dc-incpath.c include path generation for preprocessorc-lang.c language infrastructure, front-end hookupsc-lex.c lexical analyzer (manually coded)c-objc-common.c some functions for C and Objective-Cc-opts.c option processing, some init stuffc-parser.c parser (based on an old bison parser)c-pch.c precompiled header supportc-ppoutput.c preprocessing-only support (-E option)c-pragma.c support for #pragma pack and #pragma weakc-pretty-print.c used to pretty-print expressions in error messagesc-semantics.c statement list handling in IRc-typeck.c functions to build IR, type checksgccspec.c driver-specific tasks for gcc driver
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
34
The C++ front-end• In subdirectory gcc/cp/
same parse entry point as C compiler
call.c function/method invocation lookup and handlingclass.c building (the runtime artifacts of) classes etc.cp-gimplify.c IR loweringcp-lang.c language hooks for C++ front-endcp-objcp-common.c common bits for C++ and Objective-C++cvt.c type conversioncxx-pretty-print.c C++ pretty-printerdecl.c declaration and variable handlingdecl2.c additional declaration and variable handlingdump.c IR dumping
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
35
The C++ front-end, cont’derror.c C++ error-reporting callbacksexcept.c C++ exception-handling supportexpr.c IR lowering for C++friend.c C++ “friend” supportinit.c data initializers and constructorslex.c the C++ lexical analyzermangle.c C++ name manglingmethod.c method handling; default constructor generationname-lookup.c context-aware name (type, var, namespace) lookupoptimize.c constructor/destructor cloningparser.c the C++ parserpt.c parameterized type (template) supportptree.c IR pretty-printingrepo.c C++ template repository support
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
36
The C++ front-end, cont’drtti.c support for run-time type informationsearch.c type search in the presence of multiple inheritancesemantics.c semantic checkingtree.c C++ front-end specific IR functionalitytypeck.c functionality dealing with types, conversiontypeck2.c types, conversion, type errorsg++spec.c driver-specific tasks for g++ driver
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
37
The Java front-end• In subdirectory gcc/java/
parse entry point java_parse_file (jcf-parse.c)
boehm.c per-type bitmask building for Boehm GCbuffer.{c,h} expandable buffer data typebuiltins.c builtin/inline functions for Java (like Math.min())check-init.c checks over IR for uninitialized variablesclass.c IR building of classes, class-references, vtables, etc.constants.c class file constant pool handlingdecl.c Java declaration support (misc.)except.c Java exception supportexpr.c Java expressions (misc.)gjavah.c source for gcjh programjava-gimplify.c IR lowering
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
38
The Java front-endjcf-depend.c class file dependency trackingjcf-dump.c source for jcf-dump programjcf-io.c class file I/O utility functionsjcf-parse.c entry point for compiling Java filesjcf-path.c CLASSPATH-sensitive searchjcf-reader.c generic, pluggable class file readerjcf-write.c class file writerjv-scan.c source for jv-scan programjvgenmain.c source for jvgenmain programjvspec.c Java option specslang.c language hooks, options processingmangle.c symbol-mangling routinesmangle_name.c symbol-mangling routines
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
39
The Java front-endparse-scan.y minimal, fast parser for syntax checking (GONE)parse.y Java (source-language) parser (GONE 4.3)resource.c Support for --resource optiontypeck.c routines related to types and type conversionverify-glue.c interface between verifier and compilerverify-impl.c bytecode verifierwin32-host.c for Windows; case-sensitive filename matchingzextract.c read class files from zip/jar archiveskeyword.gperf Java keyword specification
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
40
Multiple “front-ends” for Java (< 4.3.x)• common entry point at java_parse_file
gcc/java/jcf-parse.c• compile .java .o
gcc/java/parse.y• compile .class .o (or .jar .so)
gcc/java/expr.c (with gcc/java/jcf-reader.c)expand_byte_code, process_jvm_instruction
• compile .java .class (with –C option)gcc/java/parse.y with flag_emit_class_files setunusual back-end (as if syntax checking only)
• In 4.3.x, ecj1 (Eclipse) used for .java .class
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
41
The “treelang” front end:Essential front-end components
• configure fragment (config-lang.in)• language-specific options (lang.opt)• filename handling for driver (lang-specs.h)• treelang-specific tree codes (treelang-tree.def)• front-end hookups to toplev.c (treetree.c)
see gcc/langhooks.h for documentation• flex scanner (lex.l)• bison parser (parse.y)• structural functions (tree1.c)
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
42The GCC Front-End
Adding a newfront-end to GCC
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
43
GENERIC trees
• Front-ends are written in C !
• We’d like to have…tree node base class
• subclasses for expressions etc.
• Instead we haveunion tree_node (gcc/tree.h)
• each field is a struct components of union
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
44
Structs vs. unions
field 1field 1
field 2field 2
field 3field 3
field 4field 4
struct
field 1field 1field 2field 2
field 3field 3 field 4field 4union
low memory
high memory
fields overlap in memory;you’re on your own for type safety !
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
45
The tree_node union
low memory
high memorytypedef union tree_node *tree;
union tree_node
int_cstint_cst typetype identifieridentifier
field_declfield_decl expexp …
Everything is a tree !
common
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
46
The tree_node union
• The “common” part containscode (kind of tree – declaration, expression, etc.)chain (for linking trees together)type (type of the represented item – also a tree)flags
• side effects• addressable• access flags (used for other things in non-declarations)• 7 language-specific flags
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
47
Macros for accessing tree parts
• In the common partTREE_*
• TREE_CODE(tree)• TREE_TYPE(tree)• TREE_SIDE_EFFECTS(tree) etc.
• For specific treestype trees
• TYPE_*– TYPE_FIELDS(tree) gets a list of fields in the type– TYPE_NAME(tree) gets the type’s associated decl
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
48
Expression trees• Lots of tree codes used for expressions
gcc/tree.def defines all standard tree codesLT_EXPR less-than conditionalTRUTH_ORIF_EXPR short-circuiting OR conditionalMODIFY_EXPR assignmentNOP_EXPR type promotion (typically)SAVE_EXPR store in temporary for multiple usesADDR_EXPR take address of
• Front-end extensions to GENERIC permittedgcc/c-common.defgcc/cp/cp-tree.def e.g. DYNAMIC_CAST_EXPRgcc/java/java-tree.def e.g. SYNCHRONIZED_EXPR
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
49
A few useful front-end functionsbuild() expression tree building – pass tree code,
tree type, and (arbitrary number of) operands
fold() simple tree restructuring and optimization; mostly useful for constant folding
gcc_assert() assertion verification – if it fails it gives an “internal compiler error” report with source file and line number under compilation (as well as source file and line number in compiler code)
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
50
Code naming conventions
• Preprocessor macros ALL UPPERCASE• Variables/functions all lowercase with underscores• Predicates end in “_P” or “_p”• Global flags start with “flag_”• Global trees (vary somewhat with front-end)
null_node (or null_pointer_node)integer_zero_nodevoid_type_nodeinteger_unsigned_type_node (or unsigned_int_type_node)
• Tree accessor macros FROM_TO (e.g. TYPE_DECL)
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
51The GCC Front-End
Modifying thefront-end
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二
52
Gimplification
• GENERIC + extensions GIMPLEGIMPLE is a subset of GENERIC
based on SIMPLE from McGill’s McCAT group• GIMPLE is just like GENERIC but
no language extensions• front-end gimplify_expr callback
3-address form (with temporary variables)control structures lowered to goto
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
The GCC Middle-End
2008年8月12日星期二
53
Front-endFront-end Back-endBack-endMiddle-endMiddle-end
Optimization of trees
Static Single-Assignment form
The Register Transfer Languageintermediate representation
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
54
Back-end
Front-end
Middle-end
The middle-end in context
GimplificationGimplificationTree optimizationsTree optimizationsTree optimizationsTree optimizationsTree optimizations
Expansion into RTLExpansion into RTL
Register allocationRegister allocationRTL passesRTL passesRTL passesRTL passesRTL passes
RTL passesRTL passesRTL passesRTL passesRTL passes
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
55
Optimizations over thetree representation
• Managed by pass manager in gcc/passes.cinit_optimization_passes orders the passespasses represented by a tree_opt_pass struct (tree-pass.h) even though it does RTL now too
• “gate” function – whether or not to run optimization• “execute” function – implementation of pass• property bitmaps
– properties required, destroyed, and created
• “todo” bitmaps– run internal GC, dump the tree, verify SSA form, etc.
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
56
Passes and subpasses
• Passes can be used to group subpasses• all_passes contains all_optimization_passes
all_optimization_passes has optimizations in order• pass_tree_loop contains loop optimizations
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
57The GCC Middle-End
Adding a tree optimization pass
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
58
Debugging middle-end tree passesCommand-line options for dumping trees:-fdump-tree-X output after pass X-fdump-tree-original output initial tree (before all opts)-fdump-tree-optimized output final GIMPLE (after all opts)-fdump-tree-gimple dump before & after gimplification-fdump-tree-inlined output after function inlining-fdump-tree-all output after each pass(Make sure you specify an –O level or you might not get anything.)
Passes available for dumping in GCC 4.1.1 (see info page):cfg, vcg, ch, ssa, salias, alias, ccp, storeccp, pre, fre, copyprop, store_copyprop, dce, mudflap, sra, sink, dom, dse, phiopt, forwprop, copyrename, nrv, vect, vrp
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
59
Debugging middle-end tree passesCan specify options for tree dumps:• address print address of each tree node• slim less output; don’t dump all scope bodies• raw raw tree output (rather than pretty-printed C-like trees)• details detailed output (not supported by all passes)• stats statistics (not supported by all passes)• blocks basic block boundaries• vops output virtual operands for each statement• lineno output line #s• uid output decl’s unique ID along with each variable• all all except raw, slim, and lineno\e.g.
-fdump-tree-dse-details detailed post-DSE output-fdump-tree-all-all (almost) everything
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
60
Static Single-Assignment (SSA) form
• (Pure) functional languages have nice properties for optimization
single-assignment: one assignment to each variablestatic single-assignment: next best thing
• each variable assigned at one static location in the programmakes it clearer where data is produced
• reduces complexity of many optimization algorithms• removes association of variable uses over its lifetime
Cytron et al. Efficiently computing static single assignment form and the control dependence graph.ACM TOPLAS, October 1991.
Cytron et al. Efficiently computing static single assignment form and the control dependence graph.ACM TOPLAS, October 1991.
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
61
SSA renaming (1)
y = 10;
/* compute 2^y */x = 1;while (y > 0) {
x = x * 2;y = y - 1;
}
y = 10;
/* compute 2^y */x = 1;while (y > 0) {
x = x * 2;y = y - 1;
}
y = 10x = 1
y = 10x = 1
x = x * 2y = y - 1x = x * 2y = y - 1
y < 0 ?y < 0 ?
EXITEXIT
false
true
model control flowmodel control flow
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
62
SSA renaming (2)
y1 = 10x1 = 1
y1 = 10x1 = 1
x2 = x1 * 2y2 = y1 - 1x2 = x1 * 2y2 = y1 - 1
y1 < 0 ?y1 < 0 ?
EXITEXIT
false
true
y1 = 10;
/* compute 2^y */x1 = 1;while (y1 > 0) {x2 = x1 * 2;y2 = y1 - 1;
}
y1 = 10;
/* compute 2^y */x1 = 1;while (y1 > 0) {x2 = x1 * 2;y2 = y1 - 1;
}
version all variablesversion all variables
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
63
SSA renaming (3)y1 = 10x1 = 1
y1 = 10x1 = 1
x2 = x3 * 2y2 = y3 - 1x2 = x3 * 2y2 = y3 - 1
y3 < 0 ?y3 < 0 ?
EXITEXIT
false
true
x3 = φ(x1, x2)y3 = φ(y1, y2)x3 = φ(x1, x2)y3 = φ(y1, y2)
y1 = 10;
/* compute 2^y */x1 = 1;while(true) {
x3 = φ(x1, x2);y3 = φ(y1, y2);
if (y3 > 0)break;
x2 = x3 * 2;y2 = y3 - 1;
}
y1 = 10;
/* compute 2^y */x1 = 1;while(true) {
x3 = φ(x1, x2);y3 = φ(y1, y2);
if (y3 > 0)break;
x2 = x3 * 2;y2 = y3 - 1;
}
insert “phi” nodesinsert “phi” nodes
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
64
Into and out of SSA form in GCC
pass_del_ssapass_del_ssapass_del_ssapass_del_ssapass_del_ssapass_del_ssaSSA optimizations
pass_build_ssapass_build_ssa
gcc/tree-into-ssa.c
pass_del_ssapass_del_ssa
gcc/tree-outof-ssa.c
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
65
Dealing with SSA form in GCC
• Given a tree node n with code = PHI_NODEPHI_RESULT(n) get lhs of φPHI_NUM_ARGS(n) get rhs countPHI_ARG_DEF(n, i) get ssa-namePHI_ARG_EDGE(n, i) get edgePHI_ARG_ELT(n, i) tuple (ssa-name, edge)
• Given a tree node n with code = SSA_NAMESSA_NAME_DEF_STMT(n) get defining statementSSA_NAME_VERSION(n) get SSA version #
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
66
A few useful functions in the middle-end
walk_use_def_chains(var, func, data)start at ssa-name var, calling func at each point up the chain; data is a generic pointer for use by func
— see tree-ssa.c and internals docs (info gccint)
walk_dominator_tree(dom-walk-data, basic-block)start at basic-block and walk children in dominator relationship; dom-walk-data provides several callbacks
— see domwalk.h and internals docs (info gccint)
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
67The GCC Middle-End
Implementing an optimizationfrom start to finish
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
68
RTL expansion and optimization
• Expansion performed by pass_expand (gcc/cfgexpand.c)
Back-end has a say in this
• As of GCC 4.1.x, RTL passes are carried out by same pass manager that works on trees
• pass_final (at end) outputs assembly
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
69
RTL expansion
• Machine description (.md) files for target CPUdefine_expand and define_insn
match standard names and generate RTL; assist in expansion of GIMPLE
define_insn takes five args:1. name (or empty string)2. RTL template3. condition (C)4. output template (assembly, or
C that generates assembly)5. attributes (optional)
define_insn takes five args:1. name (or empty string)2. RTL template3. condition (C)4. output template (assembly, or
C that generates assembly)5. attributes (optional)
define_expand takes five args:1. name2. RTL template3. condition (C)4. preparation statements (C)
define_expand takes five args:1. name2. RTL template3. condition (C)4. preparation statements (C)
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
70
e.g. in gcc/config/i386/i386.md
(define_expand "movhi"[(set (match_operand:HI 0 "nonimmediate_operand" "")
(match_operand:HI 1 "general_operand" ""))]"""ix86_expand_move (HImode, operands); DONE;")
(define_expand "movhi"[(set (match_operand:HI 0 "nonimmediate_operand" "")
(match_operand:HI 1 "general_operand" ""))]"""ix86_expand_move (HImode, operands); DONE;")
(define_expand name RTL condition prep-stmts)expands operation name to RTL (if applicable)runs prep-stmts
(match_operand[:mode] N predicate constraint)tries to match movhi operand N of mode if predicategeneral_operand is any imm/mem/reg valid for mode
(set dest src)RTL for assignment; here, the implementation for movhi
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
71
e.g. in gcc/config/i386/i386.md (2)
(define_insn "jump"[(set (pc)
(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])
(define_insn "jump"[(set (pc)
(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])
(define_insn name RTL condition output attributes)expands operation name to RTL (if applicable)condition can emit additional instructions
(match_operand[:mode] N predicate constraint)tries to match jump operand N of mode if predicate
(set (pc) label)RTL for an unconditional jump
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
72
Standard instruction namesmovMreload_inMmovstrictMmovmisalignMload_multiplestore_multiplevec_setMvec_extractMvec_initMpushMaddMsubMmulM3sminMsmaxM3reduc_smin_Mreduc_smax_Mreduc_umin_Mreduc_umax_Mreduc_splus_Mreduc_uplus_Mvec_shl_Mvec_shr_M
mulhisi3mulqihi3umulqihi3umulsidi3smulMumulMdivmodMudivmodMashlMashrMlshrM3rotlM3rotrM3negMabsMsqrtMcosMsinMexpMlogMpowMatan2MfloorM
btruncMroundMceilMnearbyintMrintMcopysignMffsMclzMctzMpopcountMparityMone_cmplMcmpMtstMmovmemMmovstrsetmemMcmpstrnMcmpstrMcmpmemMstrlenMfloatMN2floatunsMN2
fixMN2fixunsMN2ftruncMfix_truncMN2fixuns_truncMN2truncMN2extendMN2zero_extendMN2extvextzvinsvmovMODEaddMODEsCONDbCONDcbranchMODEjumpcallcall_valuecall_popuntyped_callreturnuntyped_return
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
73
Standard instruction names (more!)sync_compare_and_swapMODEsync_compare_and_swap_ccMODEsync_addMODEsync_subMODEsync_old_addMODEsync_old_subMODEsync_new_addMODEsync_new_subMODEsync_lock_test_and_setMODEsync_lock_releaseMODEstack_protect_setstack_protect_testdecrement_and_branch_until_zerocanonicalize_funcptr_for_compare
nopindirect_jumpcasesitablejumpdoloop_enddoloop_beginsave_stack_blockallocate_stackcheck_stacknonlocal_gotononlocal_goto_receiverexception_receiverbuiltin_setjmp_setupbuiltin_setjmp_receiverbuiltin_longjmpeh_returnprologueepiloguesibcall_epiloguetrapconditional_trapprefetchmemory_barrier
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
74
Uses for the machine description1. RTL expansion
define_expand and define_insn with standard namesones with unknown names are ignoredif a needed name isn’t given, GCC crashes
2. RTL adjustmentdefine_split, define_peephole
3. Hard register allocation (reloading)register description, preferencing
4. RTL template matchingdefine_insn (name ignored)assembly generated
optimization passes
optimization passes
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二
75
RTL optimization passes (gcc/passes.c)• RTL expansion• CFG cleanup/“jump optimization”• EH landing pad injection• Local CSE• Global CSE• Loop CSE/unroll/peel/unswitch• Jump bypassing• Branch prob. instrumentation/analysis• If conversion (conditional moves etc.)• Tracer (trail dup for superblock
formation)• Web construction (split pseudoreg’s)• Pseudoregister liveness analysis (DSE,
auto-inc/dec addressing)• Instruction combination• Partition basic blocks (hot and cold)
• Register movement (avoid rt moves)• Optimize mode switching• Modulo scheduling (loop pipelining)• Instruction scheduling• Register allocation (reloading)• Optimize stack operations• Peephole optimizations
(define_peephole)• Basic block reordering (profile-driven)• Variable tracking (debug support)• Delayed branch scheduling• Branch shortening• Register-to-stack conversion• Final• Debugging information dump
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
The GCC Back-End
2008年8月12日星期二
76
Front-endFront-end Middle-endMiddle-end Back-endBack-end
Register allocation
Instruction selection
Debugger support
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二
77
Register allocation
• RTL pseudo-registers hard registers
• Proceeds in several passes1. Register class scan (preference registers)2. Register allocation within basic blocks3. Register allocation for remaining registers4. Reload (renumbering, spilling)
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二
78
Instruction selection(define_insn "jump"[(set (pc)
(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])
(define_insn "jump"[(set (pc)
(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])
(define_insn name RTL condition output attributes)matches RTL (if applicable)
(match_operand[:mode] N predicate constraint)tries to match jump operand N of mode if predicate
jmp\t%l0Generates x86 jmp with %l0 (operand 0 as label)
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
79The GCC Back-End
A machine description tour
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二
80
Debugger support• Specifying –g to the compiler inserts debugging
symbols in the assembly output• DWARF2 format
embedded within ELFa tree of debug info entries (compilation unit at the root)
• each with a linked list of attributes
DWARF2 manual: ftp.freestandards.org/pub/dwarf/dwarf-2.0.0.pdf
• Once assembled, “readelf –w” interprets them
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
Runtime Issues
2008年8月12日星期二
81
Object layout
Virtual method lookup
The Boehm garbage collector
crt stuff
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
82
class A {public:
int x;virtual void myMethod();virtual void other();
};
class B : public A {public:
int y;virtual void myMethod();virtual void third();
};
class A {public:
int x;virtual void myMethod();virtual void other();
};
class B : public A {public:
int y;virtual void myMethod();virtual void third();
};
Simple object layout (C++)
vtablevtable
xx
vtablevtable
xx
yy
A::myMethodA::myMethod
A::otherA::other
B::myMethodB::myMethod
A::otherA::other
vtable for B
vtable for A instances of A
instances of B
B::thirdB::third
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
83
Simple object layout (C++)
vtablevtable
xx
vtablevtable
xx
yy
A::myMethodA::myMethod
A::otherA::other
B::myMethodB::myMethod
A::otherA::other
vtable for B
vtable for A instances of A
instances of B
subo
bjec
t A o
f B
B::thirdB::thirdsub-
vtab
le A
of B
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
84
class B pointerclass B pointer
GC descriptorGC descriptor
finalizefinalize
hashCodehashCode
Object layout (Java)
vtablevtable
xx
yyequalsequals
toStringtoStringinstances of B
subo
bjec
t A o
f B
cloneclone
sub-
vtab
le O
bjec
t of B
myMethodmyMethod
otherother
vtab
le fo
r B
thirdthird
sub-
vtab
le A
of B
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
85
But more complicated for C++ !
xx
xx
yy
instances of A
instances of B
subo
bjec
t A o
f B
class A {public:
int x;void myMethod();void other();
};
class B : public A {public:
int y;void myMethod();void third();
};
class A {public:
int x;void myMethod();void other();
};
class B : public A {public:
int y;void myMethod();void third();
};
First, classes might not have virtual functions !First, classes might not have virtual functions !
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
86
But more complicated for C++ !class A {public:
int x;virtual void one();
};
class B {public:
int y;virtual void two();
};
class C : public A, public B {public:
int z;virtual void three();
};
class A {public:
int x;virtual void one();
};
class B {public:
int y;virtual void two();
};
class C : public A, public B {public:
int z;virtual void three();
};
Second, classes might have multiple bases !Second, classes might have multiple bases !
vtablevtable
xxA::oneA::one
vtable for A
instances of A
vtablevtable
yyB::twoB::two
vtable for B
instances of B
vtable for C instances of C
?? ??
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
87
Object layout for multiple bases
vtablevtable
xx
A::oneA::one
vtable for A
instances of A
vtablevtable
yy
B::twoB::two
vtable for B
instances of B
instances of C
vtablevtable
xx
vtablevtable
yy
zz
vtable for C
A::oneA::one
——
——
B::twoB::two
C::threeC::three
subo
bjec
t A o
f Csu
bobj
ect B
of C
Requires “this pointer-adjustment”Requires “this pointer-adjustment”
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
88
Multiple bases, cont’d
instances of C
vtablevtable
xx
vtablevtable
yy
zz
vtable for C
A::oneA::one
[ offset = – 4 ][ offset = – 4 ]
——
B::twoB::two
C::threeC::three
subo
bjec
t A o
f Csu
bobj
ect B
of C
But what about dynamic_cast ?!But what about dynamic_cast ?!
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
89
[ offset = 0 ][ offset = 0 ]
——
Multiple bases, cont’d
instances of C
vtablevtable
xx
vtablevtable
yy
zz
vtable for C
A::oneA::one
[ offset = – 4 ][ offset = – 4 ]
——
B::twoB::two
C::threeC::three
subo
bjec
t A o
f Csu
bobj
ect B
of C
Top-level offsetTop-level offset
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
90
[ offset = 0 ][ offset = 0 ]
ptr. typeinfo Cptr. typeinfo C
Multiple bases, finished*
instances of C
vtablevtable
xx
vtablevtable
yy
zz
vtable for C
A::oneA::one
[ offset = – 4 ][ offset = – 4 ]
ptr. typeinfo Cptr. typeinfo C
B::twoB::two
C::threeC::three
subo
bjec
t A o
f Csu
bobj
ect B
of C
But what about C++ type info ?!But what about C++ type info ?!
* there are further complications, but we’ll leave it here* there are further complications, but we’ll leave it here
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
91
Java and C++ share object layout
vtab
le fo
r (Ja
va) B
[ offset = 0 ][ offset = 0 ]
null typeinfonull typeinfo
class B pointerclass B pointer
GC descriptorGC descriptor
finalizefinalize
hashCodehashCode
equalsequals
toStringtoString
cloneclone
myMethodmyMethod
otherother
thirdthird
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
92
Virtual method lookup (C++, Java)vt
able
for (
Java
) B
[ offset = 0 ][ offset = 0 ]
null typeinfonull typeinfo
class B pointerclass B pointer
GC descriptorGC descriptor
finalizefinalize
hashCodehashCode
equalsequals
toStringtoString
cloneclone
myMethodmyMethod
otherother
thirdthird
Now, virtual methodinvocation is a snap !Now, virtual methodinvocation is a snap !
Compiler knows methodoffset within vtable
Compiler knows methodoffset within vtable
So it generates an indirectaccess through instance pointer…
So it generates an indirectaccess through instance pointer…
…and invokes the methodthrough the pointer found in vtable
…and invokes the methodthrough the pointer found in vtable
vtablevtable
xx
yy
instance of B
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
93
The Boehm garbage collector
• Conservative mark & sweep garbage collectordesigned to operate in a hostile environment as a drop-in replacement for malloc“conservative” means it cannot distinguish between pointers and non-pointersJava is considerably less “hostile” than C/C++
• can’t hide pointers from the compiler
Boehm, H., Space Efficient Conservative Garbage Collection. In ACM PLDI’91.
Boehm, H., Space Efficient Conservative Garbage Collection. In ACM PLDI’91.
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
94
• Java front-end generates class pointer masksstows them in vtablecomputed in gcc/java/boehm.c
• Class too big for a pointer mask ?use a count of reference fieldsuse a “mark procedure”
• Where to lookboehm-gc/doc contains docslibjava/prims.cc contains GC-aware allocation routines
[ offset = 0 ][ offset = 0 ]
null typeinfonull typeinfo
class B pointerclass B pointer
GC descriptorGC descriptor
Java and Boehm GC
finalize
…
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二
95
crt stuff (“C runtime”)
• crt1.o, crti.o, crtn.o* provided by glibccrt1.o sets up libc before main() is even invokedcrti.o prologue for .init and .finicrtn.o epilogue for .init and .fini
• crtbegin.o, crtend.o* provided by GCCcrtbegin.o contributes frame_dummy() call to .init;
calls static data destructors in .finicrtend.o calls static data constructors in .init
code in gcc/crtstuff.c* and some variations* and some variations
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
Miscellany
2008年8月12日星期二
96
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Miscellany 2008年8月12日星期二
97
Tips on modifying code
• Find something similarbrowse source code
• build off it• use a debugger
-fdump-tree-*-d*debug_tree/browse_tree
PLDI 2008Tuscon, Arizona
Morgan Deters and Ron CytronDemystifying GCC
Demyst
ifying G
CC
Wrap-up
2008年8月12日星期二
98
Running GCC under GDB
Obtaining development versions of GCC
Reporting bugs in GCC
What’s next for GCC
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二
99
Running GCC under GDB
• Inevitably, hacking a compiler will result insegfaultassertion faultincorrect code generation
• Remember to attach debugger to the compiler,not the driver
• “gcc –v …,” then use GDB on the actual front-end
Morgan Deters and Ron CytronDemystifying GCC
In Greater Depth
OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二
100Wrap-up
Debugging GCC
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二
101
Obtaining development versions of GCC
• All GCC development is in the opendesign discussionschange logsbugs
• Subversion (SVN) repositorypublic read accessfor details: gcc.gnu.org/svn.htmlclients available from subversion.tigris.org/
Morgan Deters and Ron CytronDemystifying GCC
PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二
102
What to do if you find a bug in GCC
• Check to see if bug is present in SVN version• Check to see if bug is in bug database
http://gcc.gnu.org/bugzilla/• Collect version information (gcc --version)
• Guidelines: http://gcc.gnu.org/bugs.html• Report it: http://gcc.gnu.org/bugzilla/
Demystifying GCC:Under the Hood of the GNU Compiler Collection
Copyright (c) 2005-2008 Morgan Deters and Ron Cytron
PLDI 2008
June 2008
Cop
yrig
ht ©
2005
–200
8 M
orga
n D
eter
s and
Ron
Cyt
ron
Thanks!Demystifying GCC
Distributed Object Computing LaboratoryWashington University
St. Louis, Missouri
Programming Logic GroupTechnical University of Catalonia
Barcelona, Spain
Morgan Detersmdeters@lsi.upc.edu
Ron Cytroncytron@cs.wustl.edu