Demystifying GCC

52
Copyright © 2005–2008 Morgan Deters and Ron Cytron Demystifying GCC Under the Hood of the GNU Compiler Collection Copyright is held by the author/owner(s). PLDI PLDI’08, June 9, 2008, Tuscon, Arizona, USA 08, June 9, 2008, Tuscon, Arizona, USA Distributed Object Computing Laboratory Washington University St. Louis, Missouri Programming Logic Group Technical University of Catalonia Barcelona, Spain Morgan Deters [email protected] Ron Cytron [email protected] Morgan Deters and Ron Cytron Demystifying GCC PLDI 2008 Tuscon, Arizona Tutorial Objectives 2008812日星期二 2 Tutorial Objectives Introduce the internals of GCC 4.3.0 (March 2008) Java and C++ front-ends Optimizations Back-end structure How to modify, or write your own Front end New languages, new features Middle end Analysis, optimization Back end Machine-specific targets How to debug/improve GCC

Transcript of Demystifying GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Cop

yrig

ht ©

2005

–200

8 M

orga

n D

eter

s and

Ron

Cyt

ron

Demystifying GCCUnder the Hood of the

GNU Compiler Collection

Copyright is held by the author/owner(s).PLDI’08, June 9, 2008, Tuscon, Arizona, USA

Copyright is held by the author/owner(s).PLDIPLDI’’08, June 9, 2008, Tuscon, Arizona, USA08, June 9, 2008, Tuscon, Arizona, USA

Distributed Object Computing LaboratoryWashington University

St. Louis, Missouri

Programming Logic GroupTechnical University of Catalonia

Barcelona, Spain

Morgan [email protected]

Ron [email protected]

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Tutorial Objectives 2008年8月12日星期二

2

Tutorial Objectives• Introduce the internals of GCC 4.3.0 (March 2008)

Java and C++ front-endsOptimizationsBack-end structure

• How to modify, or write your ownFront end

• New languages, new featuresMiddle end

• Analysis, optimizationBack end

• Machine-specific targets

• How to debug/improve GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

Introduction

2008年8月12日星期二

3

What is GCC?

Why use GCC?

What does compilation with GCC look like?

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

4

What is GCC ?

• A compiler for multiple languages…CC++JavaObjective-C/C++FORTRANAda

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

5

What is GCC ?• …supporting multiple targets

arc arm avr bfinc4x cris crx fr30frv h8300 i386 ia64iq2000 m32c m32r m68hc11m68k mcore mips mmixmn10300 mt pa pdp11rs6000 s390 sh sparcstormy16 v850 vax xtensa

These are code generators; variants are also supported(e.g. powerpc is a “variant” of the rs6000 code generator)These are code generators; variants are also supported

(e.g. powerpc is a “variant” of the rs6000 code generator)

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

6

What GCC is not

• GCC is notan assembler (see GNU binutils)a C library (see glibc)a debugger (see gdb)an IDE

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

7

Why use GCC as an R&D platform?

• Research is immediately usable by everyoneLarge development community and user baseGCC is a modern, practical compiler

• multiple architectures, full standard languages, optimizations• debugging support

• You can meet GCC halfwaymodular: hack some parts, rely on the others

• Can incorporate bug fixes that come alongminor version upgrades (e.g. 3.3.x 3.4.x) – no big dealmajor version upgrades (e.g. 3.x 4.x) – more of a pain

• Need not maintain code indefinitely (if incorporated)

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

8

The GCC project and the GPL

• Open-sourceGNU General Public License (GPL)

• Changes made to GCC source code or associated libraries must also be GPLed

• However, compiler and libraries can be used/linked against in non-GPL development

Your improvements to GCC must be open-source, but your customers need not open-source their

programs to use your stuff

Your improvements to GCC must be open-source, but your customers need not open-source their

programs to use your stuff

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

9

Typical structure of GCC compilation

gcc/g++/gcj

compilercompiler assemblerassembler linkerlinker

ELF objectELF object

assemblyprogram

assemblyprogram

sourceprogramsource

program

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Introduction 2008年8月12日星期二

10

Inside the compiler

compiler (C, C++, Java)

parser /semanticchecker

parser /semanticchecker

treeoptimizations

treeoptimizations

gimplifiergimplifier expanderexpander

RTL passesRTL passes

target archinstruction selection

target archinstruction selection

trees RTL

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

GCC Basics

2008年8月12日星期二

11

How do you build GCC?

How do you navigate the source tree?

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

12

GCC Basics: Getting Started• Requirements to build GCC

usual suite of UNIX tools (C compiler, assembler/linker, GNU Make, tar, awk, POSIX shell)

• For developmentGNU m4 and GNU autotools (autoconf/automake/libtool)gperfbison, flexautogen, guile, gettext, perl, Texinfo, diffutils, patch, …

• Obtaining GCC sourcesgcc.gnu.org or local mirror (see gcc.gnu.org/mirrors.html)get gcc-core package, then language add-ons

• gcc-java requires gcc-g++

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

13

Source Metrics

• 492 Mbytes of material downloaded to build GCC• 2.7 Gbytes after build• As of 4.3.0

Need mpfr – Multiple precision floating point arithmeticNeed gmp – Multiple precision integer arithmeticNeed Eclipse – for Java front-end

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

14

Building GCC from sources• Configure it in a separate build directory from sources

/path/to/source/directory/configure options…--prefix=install-location--enable-languages=comma-separated-language-list

• To see the list of available languages:grep language= */config-lang.in

--enable-checking• turns on sanity checks (especially on intermediate representation)

• Build it !Environment variables useful when debugging compiler/runtime

• CFLAGS stage 1 flags (using host C compiler)• BOOT_CFLAGS stage 2 and stage 3 flags (using stage 1 GCC)• CFLAGS_FOR_TARGET flags for new GCC building target binaries• CXXFLAGS_FOR_TARGET

flags for new GCC building libstdc++/others• GCJFLAGS flags for new GCC building Java runtime• ‘-O0 –ggdb3’ is recommended when debugging

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

15

Building GCC from sources• Build it ! continued…

make bootstrap (to bootstrap) or make (to not)• bootstrap useful when compiling with non-GCC host compiler• during development, non-bootstrap is faster and also better at

recompiling just those sources that have changeduse make’s -j option to speed things up on MP/dual coremake bootstrap-lean

• cleans up between stages, uses less diskmake profiledbootstrap

• faster compiler produced, but need GCC host• –j unsupported

• Install it !make install

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

16

Building a cross-compiler• Code generator can be built for any target

runtime libraries then are built using that code generator

• Since GCC outputs assembly, you actually need a full cross development toolchain

Dan Kegel’s crosstool automates a GNU/Linux cross chain for popular configurations:

• Linux kernel headers• GNU binutils• glibc• gcc• see kegel.com/crosstool

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

17

GCC Basics: Getting Around

• Other tools recommended when hacking GCC

GNU Screen attach/reattach terminal sessionsetags navigation to source definitions (emacs)ctags navigation to source definitions (vi)c++filt demangle C++/Java mangled symbolsreadelf decompose ELF filesobjdump object file dumper/disassemblergdb GNU debugger

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

18

GCC Drivers

• gcc, g++, gcj are drivers, not compilersThey will execute (as appropriate):

• compiler (cc1, cc1plus, jc1)• Java program main entry point generation (jvgenmain)• assembler (as)• linker (collect2)

• Differences between drivers include active #defines, default libraries, other behavior

but can use any driver for any source language

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

19

Most useful driver options for debugging

-E preprocess, don’t compile-S compile, don’t assemble-H verbose header inclusion-save-temps save temporary files-print-search-dirs print search paths-v verbose (see what the driver does)-g include debugging symbols

--help get command line help--version show full version info-dumpversion show minimal version info

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

20

For extra help

man gcc basic option assistance

info gcc using gcc in-depth;language extensions etc.

info gccint internals documentation

Top-level INSTALL directory in distribution provideshelp on configuring and building GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

21

Tour of GCC sourceINSTALL configuration/installation documentationboehm-gc the Boehm garbage collectorconfig architecture-specific configure fragmentscontrib contributed scriptsgjar a replacement for the jar toolfixincludes source for a program to fix host header

files when they aren't ANSI-compliantgcc the main compiler sourceinclude headers used by GCC (libiberty mostly)intl support for languages other than English

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona GCC Basics 2008年8月12日星期二

22

Tour of GCC source, cont’dlibcpp source for C preprocessing librarylibffi Foreign Function Interface library (allows

function callers and receivers to havedifferent calling conventions)

libiberty useful utility routines (symbol tables etc.)used by GCC and replacement functionsfor common things not provided by host

libjava source for standard Java librarylibmudflap source for a pointer instrumentation librarylibstdc++-v3 source for standard C++ librarymaintainer-scripts utility scripts for GCC maintainerszlib compression library source

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

The GCC Front-End

2008年8月12日星期二

23

Middle-endMiddle-end Back-endBack-endFront-endFront-end

Option processing

Controlling drivers and hooking up front-ends

The C, C++, and Java front-ends

The GENERIC high-level intermediate representation

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

24

The GCC Front-End

• gcc, g++, gcj driver entry pointmain (gcc/gcc.c)

• cc1, cc1plus, jc1 share a common entry pointtoplev_main (gcc/toplev.c)

• actual main in gcc/main.c– just calls toplev_main()– can be overridden by front-end

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

25

Command-line option processing

• In gcc/ directorycommon.opt option definitionsopts.{c,h} common_handle_option()c-opts.c c_common_handle_option()c.opt C compiler option definitionsjava/lang.opt Java compiler option definitionsjava/lang.c java_handle_option()

• These are cc1, cc1plus, jc1 option handling routinesdrivers just pass on arguments as declared in spec files

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

26

common.opt

• Parsed by awk scripts at build time to generate options.c, options.h

• Simple formatLanguage specifications and option stanzas

• Each option stanza contains1. option name2. space-separated options list3. documentation string for --help output

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

27

Properties of command-line options• Available properties for use in .opt option spec files are

Common option is available for all front-endsTarget option is target-specificJoined argument is mandatory and may be joinedSeparate argument is mandatory and may be separateJoinedOrMissing optional argument, must be joined if presentRejectNegative there is not an associated “no-” optionUInteger argument expected is a nonnegative integerUndocumented undocumented; do not include in --help outputReport --fverbose-asm should report the state of this option

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

28

Properties of options cont’dVar(var-name) set var-name to true (or argument) if presentVarExists do not define variable in resulting options.cInit(value) static initializer for variableMask(name) associated with a bit in target_flags bit vector;

MASK_name is automatically #defined to thebitmask; TARGET_name is automatically #definedas an expression that is 1 when the option is used,0 when not

InverseMask(other, [this])option is inverse of another option withMask(other); if this is given, #defineTARGET_this.

MaskExists don’t #define again; use for synonymous optionsCondition(cond) option permitted iff preprocessor cond is true

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

29

Language-specific options

• gcc/c.opt, gcc/java/lang.opt, gcc/cp/lang.opt• Special processing in gcc/java/lang.c• Specify valid language-names as an option

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

30The GCC Front-End

Adding Command-Line Options

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

31

Controlling the drivers: spec filesgcc/gcc.c specs for gcc drivergcc/cp/lang-specs.h additional specs for g++ drivergcc/java/lang-specs.h additional specs for gcj driver

gcc/gcc.c contains documentation on spec languageUse -dumpspecs to see specifications

%{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)}%{!E:%{!M:%{!MM:%{traditional|ftraditional:

%eGNU C no longer supports -traditional without -E}%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)%(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \ncc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i}%(cc1_options)}

%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:cc1 %(cpp_unique_options) %(cc1_options)}}}

%{!fsyntax-only:%(invoke_as)}}}

%{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)}%{!E:%{!M:%{!MM:%{traditional|ftraditional:

%eGNU C no longer supports -traditional without -E}%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)%(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \ncc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i}%(cc1_options)}

%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:cc1 %(cpp_unique_options) %(cc1_options)}}}

%{!fsyntax-only:%(invoke_as)}}} adap

ted

from

gcc

/gcc

.cMorgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

32

The C front-end• C front-end is in gcc/ directory

parse entry point c_common_parse_file (c-opts.c)• workhorse is c_parse_file (c-parser.c)

c-common.def IR codes for C compilerc-common.c functions for C-like front-endsc-convert.c type conversionc-cppbuiltin.c built-in preprocessor #definesc-decl.c declaration handlingc-dump.c IR-dumpingc-errors.c pedantic warning issuancec-format.c format checking for printf-like functionsc-gimplify.c lowering of IR (and documentation)

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

33

The C front-end, cont’dc-incpath.c include path generation for preprocessorc-lang.c language infrastructure, front-end hookupsc-lex.c lexical analyzer (manually coded)c-objc-common.c some functions for C and Objective-Cc-opts.c option processing, some init stuffc-parser.c parser (based on an old bison parser)c-pch.c precompiled header supportc-ppoutput.c preprocessing-only support (-E option)c-pragma.c support for #pragma pack and #pragma weakc-pretty-print.c used to pretty-print expressions in error messagesc-semantics.c statement list handling in IRc-typeck.c functions to build IR, type checksgccspec.c driver-specific tasks for gcc driver

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

34

The C++ front-end• In subdirectory gcc/cp/

same parse entry point as C compiler

call.c function/method invocation lookup and handlingclass.c building (the runtime artifacts of) classes etc.cp-gimplify.c IR loweringcp-lang.c language hooks for C++ front-endcp-objcp-common.c common bits for C++ and Objective-C++cvt.c type conversioncxx-pretty-print.c C++ pretty-printerdecl.c declaration and variable handlingdecl2.c additional declaration and variable handlingdump.c IR dumping

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

35

The C++ front-end, cont’derror.c C++ error-reporting callbacksexcept.c C++ exception-handling supportexpr.c IR lowering for C++friend.c C++ “friend” supportinit.c data initializers and constructorslex.c the C++ lexical analyzermangle.c C++ name manglingmethod.c method handling; default constructor generationname-lookup.c context-aware name (type, var, namespace) lookupoptimize.c constructor/destructor cloningparser.c the C++ parserpt.c parameterized type (template) supportptree.c IR pretty-printingrepo.c C++ template repository support

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

36

The C++ front-end, cont’drtti.c support for run-time type informationsearch.c type search in the presence of multiple inheritancesemantics.c semantic checkingtree.c C++ front-end specific IR functionalitytypeck.c functionality dealing with types, conversiontypeck2.c types, conversion, type errorsg++spec.c driver-specific tasks for g++ driver

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

37

The Java front-end• In subdirectory gcc/java/

parse entry point java_parse_file (jcf-parse.c)

boehm.c per-type bitmask building for Boehm GCbuffer.{c,h} expandable buffer data typebuiltins.c builtin/inline functions for Java (like Math.min())check-init.c checks over IR for uninitialized variablesclass.c IR building of classes, class-references, vtables, etc.constants.c class file constant pool handlingdecl.c Java declaration support (misc.)except.c Java exception supportexpr.c Java expressions (misc.)gjavah.c source for gcjh programjava-gimplify.c IR lowering

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

38

The Java front-endjcf-depend.c class file dependency trackingjcf-dump.c source for jcf-dump programjcf-io.c class file I/O utility functionsjcf-parse.c entry point for compiling Java filesjcf-path.c CLASSPATH-sensitive searchjcf-reader.c generic, pluggable class file readerjcf-write.c class file writerjv-scan.c source for jv-scan programjvgenmain.c source for jvgenmain programjvspec.c Java option specslang.c language hooks, options processingmangle.c symbol-mangling routinesmangle_name.c symbol-mangling routines

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

39

The Java front-endparse-scan.y minimal, fast parser for syntax checking (GONE)parse.y Java (source-language) parser (GONE 4.3)resource.c Support for --resource optiontypeck.c routines related to types and type conversionverify-glue.c interface between verifier and compilerverify-impl.c bytecode verifierwin32-host.c for Windows; case-sensitive filename matchingzextract.c read class files from zip/jar archiveskeyword.gperf Java keyword specification

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

40

Multiple “front-ends” for Java (< 4.3.x)• common entry point at java_parse_file

gcc/java/jcf-parse.c• compile .java .o

gcc/java/parse.y• compile .class .o (or .jar .so)

gcc/java/expr.c (with gcc/java/jcf-reader.c)expand_byte_code, process_jvm_instruction

• compile .java .class (with –C option)gcc/java/parse.y with flag_emit_class_files setunusual back-end (as if syntax checking only)

• In 4.3.x, ecj1 (Eclipse) used for .java .class

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

41

The “treelang” front end:Essential front-end components

• configure fragment (config-lang.in)• language-specific options (lang.opt)• filename handling for driver (lang-specs.h)• treelang-specific tree codes (treelang-tree.def)• front-end hookups to toplev.c (treetree.c)

see gcc/langhooks.h for documentation• flex scanner (lex.l)• bison parser (parse.y)• structural functions (tree1.c)

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

42The GCC Front-End

Adding a newfront-end to GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

43

GENERIC trees

• Front-ends are written in C !

• We’d like to have…tree node base class

• subclasses for expressions etc.

• Instead we haveunion tree_node (gcc/tree.h)

• each field is a struct components of union

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

44

Structs vs. unions

field 1field 1

field 2field 2

field 3field 3

field 4field 4

struct

field 1field 1field 2field 2

field 3field 3 field 4field 4union

low memory

high memory

fields overlap in memory;you’re on your own for type safety !

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

45

The tree_node union

low memory

high memorytypedef union tree_node *tree;

union tree_node

int_cstint_cst typetype identifieridentifier

field_declfield_decl expexp …

Everything is a tree !

common

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

46

The tree_node union

• The “common” part containscode (kind of tree – declaration, expression, etc.)chain (for linking trees together)type (type of the represented item – also a tree)flags

• side effects• addressable• access flags (used for other things in non-declarations)• 7 language-specific flags

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

47

Macros for accessing tree parts

• In the common partTREE_*

• TREE_CODE(tree)• TREE_TYPE(tree)• TREE_SIDE_EFFECTS(tree) etc.

• For specific treestype trees

• TYPE_*– TYPE_FIELDS(tree) gets a list of fields in the type– TYPE_NAME(tree) gets the type’s associated decl

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

48

Expression trees• Lots of tree codes used for expressions

gcc/tree.def defines all standard tree codesLT_EXPR less-than conditionalTRUTH_ORIF_EXPR short-circuiting OR conditionalMODIFY_EXPR assignmentNOP_EXPR type promotion (typically)SAVE_EXPR store in temporary for multiple usesADDR_EXPR take address of

• Front-end extensions to GENERIC permittedgcc/c-common.defgcc/cp/cp-tree.def e.g. DYNAMIC_CAST_EXPRgcc/java/java-tree.def e.g. SYNCHRONIZED_EXPR

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

49

A few useful front-end functionsbuild() expression tree building – pass tree code,

tree type, and (arbitrary number of) operands

fold() simple tree restructuring and optimization; mostly useful for constant folding

gcc_assert() assertion verification – if it fails it gives an “internal compiler error” report with source file and line number under compilation (as well as source file and line number in compiler code)

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

50

Code naming conventions

• Preprocessor macros ALL UPPERCASE• Variables/functions all lowercase with underscores• Predicates end in “_P” or “_p”• Global flags start with “flag_”• Global trees (vary somewhat with front-end)

null_node (or null_pointer_node)integer_zero_nodevoid_type_nodeinteger_unsigned_type_node (or unsigned_int_type_node)

• Tree accessor macros FROM_TO (e.g. TYPE_DECL)

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

51The GCC Front-End

Modifying thefront-end

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

52

Gimplification

• GENERIC + extensions GIMPLEGIMPLE is a subset of GENERIC

based on SIMPLE from McGill’s McCAT group• GIMPLE is just like GENERIC but

no language extensions• front-end gimplify_expr callback

3-address form (with temporary variables)control structures lowered to goto

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

The GCC Middle-End

2008年8月12日星期二

53

Front-endFront-end Back-endBack-endMiddle-endMiddle-end

Optimization of trees

Static Single-Assignment form

The Register Transfer Languageintermediate representation

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

54

Back-end

Front-end

Middle-end

The middle-end in context

GimplificationGimplificationTree optimizationsTree optimizationsTree optimizationsTree optimizationsTree optimizations

Expansion into RTLExpansion into RTL

Register allocationRegister allocationRTL passesRTL passesRTL passesRTL passesRTL passes

RTL passesRTL passesRTL passesRTL passesRTL passes

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

55

Optimizations over thetree representation

• Managed by pass manager in gcc/passes.cinit_optimization_passes orders the passespasses represented by a tree_opt_pass struct (tree-pass.h) even though it does RTL now too

• “gate” function – whether or not to run optimization• “execute” function – implementation of pass• property bitmaps

– properties required, destroyed, and created

• “todo” bitmaps– run internal GC, dump the tree, verify SSA form, etc.

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

56

Passes and subpasses

• Passes can be used to group subpasses• all_passes contains all_optimization_passes

all_optimization_passes has optimizations in order• pass_tree_loop contains loop optimizations

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

57The GCC Middle-End

Adding a tree optimization pass

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

58

Debugging middle-end tree passesCommand-line options for dumping trees:-fdump-tree-X output after pass X-fdump-tree-original output initial tree (before all opts)-fdump-tree-optimized output final GIMPLE (after all opts)-fdump-tree-gimple dump before & after gimplification-fdump-tree-inlined output after function inlining-fdump-tree-all output after each pass(Make sure you specify an –O level or you might not get anything.)

Passes available for dumping in GCC 4.1.1 (see info page):cfg, vcg, ch, ssa, salias, alias, ccp, storeccp, pre, fre, copyprop, store_copyprop, dce, mudflap, sra, sink, dom, dse, phiopt, forwprop, copyrename, nrv, vect, vrp

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

59

Debugging middle-end tree passesCan specify options for tree dumps:• address print address of each tree node• slim less output; don’t dump all scope bodies• raw raw tree output (rather than pretty-printed C-like trees)• details detailed output (not supported by all passes)• stats statistics (not supported by all passes)• blocks basic block boundaries• vops output virtual operands for each statement• lineno output line #s• uid output decl’s unique ID along with each variable• all all except raw, slim, and lineno\e.g.

-fdump-tree-dse-details detailed post-DSE output-fdump-tree-all-all (almost) everything

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

60

Static Single-Assignment (SSA) form

• (Pure) functional languages have nice properties for optimization

single-assignment: one assignment to each variablestatic single-assignment: next best thing

• each variable assigned at one static location in the programmakes it clearer where data is produced

• reduces complexity of many optimization algorithms• removes association of variable uses over its lifetime

Cytron et al. Efficiently computing static single assignment form and the control dependence graph.ACM TOPLAS, October 1991.

Cytron et al. Efficiently computing static single assignment form and the control dependence graph.ACM TOPLAS, October 1991.

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

61

SSA renaming (1)

y = 10;

/* compute 2^y */x = 1;while (y > 0) {

x = x * 2;y = y - 1;

}

y = 10;

/* compute 2^y */x = 1;while (y > 0) {

x = x * 2;y = y - 1;

}

y = 10x = 1

y = 10x = 1

x = x * 2y = y - 1x = x * 2y = y - 1

y < 0 ?y < 0 ?

EXITEXIT

false

true

model control flowmodel control flow

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

62

SSA renaming (2)

y1 = 10x1 = 1

y1 = 10x1 = 1

x2 = x1 * 2y2 = y1 - 1x2 = x1 * 2y2 = y1 - 1

y1 < 0 ?y1 < 0 ?

EXITEXIT

false

true

y1 = 10;

/* compute 2^y */x1 = 1;while (y1 > 0) {x2 = x1 * 2;y2 = y1 - 1;

}

y1 = 10;

/* compute 2^y */x1 = 1;while (y1 > 0) {x2 = x1 * 2;y2 = y1 - 1;

}

version all variablesversion all variables

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

63

SSA renaming (3)y1 = 10x1 = 1

y1 = 10x1 = 1

x2 = x3 * 2y2 = y3 - 1x2 = x3 * 2y2 = y3 - 1

y3 < 0 ?y3 < 0 ?

EXITEXIT

false

true

x3 = φ(x1, x2)y3 = φ(y1, y2)x3 = φ(x1, x2)y3 = φ(y1, y2)

y1 = 10;

/* compute 2^y */x1 = 1;while(true) {

x3 = φ(x1, x2);y3 = φ(y1, y2);

if (y3 > 0)break;

x2 = x3 * 2;y2 = y3 - 1;

}

y1 = 10;

/* compute 2^y */x1 = 1;while(true) {

x3 = φ(x1, x2);y3 = φ(y1, y2);

if (y3 > 0)break;

x2 = x3 * 2;y2 = y3 - 1;

}

insert “phi” nodesinsert “phi” nodes

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

64

Into and out of SSA form in GCC

pass_del_ssapass_del_ssapass_del_ssapass_del_ssapass_del_ssapass_del_ssaSSA optimizations

pass_build_ssapass_build_ssa

gcc/tree-into-ssa.c

pass_del_ssapass_del_ssa

gcc/tree-outof-ssa.c

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

65

Dealing with SSA form in GCC

• Given a tree node n with code = PHI_NODEPHI_RESULT(n) get lhs of φPHI_NUM_ARGS(n) get rhs countPHI_ARG_DEF(n, i) get ssa-namePHI_ARG_EDGE(n, i) get edgePHI_ARG_ELT(n, i) tuple (ssa-name, edge)

• Given a tree node n with code = SSA_NAMESSA_NAME_DEF_STMT(n) get defining statementSSA_NAME_VERSION(n) get SSA version #

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

66

A few useful functions in the middle-end

walk_use_def_chains(var, func, data)start at ssa-name var, calling func at each point up the chain; data is a generic pointer for use by func

— see tree-ssa.c and internals docs (info gccint)

walk_dominator_tree(dom-walk-data, basic-block)start at basic-block and walk children in dominator relationship; dom-walk-data provides several callbacks

— see domwalk.h and internals docs (info gccint)

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

67The GCC Middle-End

Implementing an optimizationfrom start to finish

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

68

RTL expansion and optimization

• Expansion performed by pass_expand (gcc/cfgexpand.c)

Back-end has a say in this

• As of GCC 4.1.x, RTL passes are carried out by same pass manager that works on trees

• pass_final (at end) outputs assembly

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

69

RTL expansion

• Machine description (.md) files for target CPUdefine_expand and define_insn

match standard names and generate RTL; assist in expansion of GIMPLE

define_insn takes five args:1. name (or empty string)2. RTL template3. condition (C)4. output template (assembly, or

C that generates assembly)5. attributes (optional)

define_insn takes five args:1. name (or empty string)2. RTL template3. condition (C)4. output template (assembly, or

C that generates assembly)5. attributes (optional)

define_expand takes five args:1. name2. RTL template3. condition (C)4. preparation statements (C)

define_expand takes five args:1. name2. RTL template3. condition (C)4. preparation statements (C)

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

70

e.g. in gcc/config/i386/i386.md

(define_expand "movhi"[(set (match_operand:HI 0 "nonimmediate_operand" "")

(match_operand:HI 1 "general_operand" ""))]"""ix86_expand_move (HImode, operands); DONE;")

(define_expand "movhi"[(set (match_operand:HI 0 "nonimmediate_operand" "")

(match_operand:HI 1 "general_operand" ""))]"""ix86_expand_move (HImode, operands); DONE;")

(define_expand name RTL condition prep-stmts)expands operation name to RTL (if applicable)runs prep-stmts

(match_operand[:mode] N predicate constraint)tries to match movhi operand N of mode if predicategeneral_operand is any imm/mem/reg valid for mode

(set dest src)RTL for assignment; here, the implementation for movhi

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

71

e.g. in gcc/config/i386/i386.md (2)

(define_insn "jump"[(set (pc)

(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])

(define_insn "jump"[(set (pc)

(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])

(define_insn name RTL condition output attributes)expands operation name to RTL (if applicable)condition can emit additional instructions

(match_operand[:mode] N predicate constraint)tries to match jump operand N of mode if predicate

(set (pc) label)RTL for an unconditional jump

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

72

Standard instruction namesmovMreload_inMmovstrictMmovmisalignMload_multiplestore_multiplevec_setMvec_extractMvec_initMpushMaddMsubMmulM3sminMsmaxM3reduc_smin_Mreduc_smax_Mreduc_umin_Mreduc_umax_Mreduc_splus_Mreduc_uplus_Mvec_shl_Mvec_shr_M

mulhisi3mulqihi3umulqihi3umulsidi3smulMumulMdivmodMudivmodMashlMashrMlshrM3rotlM3rotrM3negMabsMsqrtMcosMsinMexpMlogMpowMatan2MfloorM

btruncMroundMceilMnearbyintMrintMcopysignMffsMclzMctzMpopcountMparityMone_cmplMcmpMtstMmovmemMmovstrsetmemMcmpstrnMcmpstrMcmpmemMstrlenMfloatMN2floatunsMN2

fixMN2fixunsMN2ftruncMfix_truncMN2fixuns_truncMN2truncMN2extendMN2zero_extendMN2extvextzvinsvmovMODEaddMODEsCONDbCONDcbranchMODEjumpcallcall_valuecall_popuntyped_callreturnuntyped_return

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

73

Standard instruction names (more!)sync_compare_and_swapMODEsync_compare_and_swap_ccMODEsync_addMODEsync_subMODEsync_old_addMODEsync_old_subMODEsync_new_addMODEsync_new_subMODEsync_lock_test_and_setMODEsync_lock_releaseMODEstack_protect_setstack_protect_testdecrement_and_branch_until_zerocanonicalize_funcptr_for_compare

nopindirect_jumpcasesitablejumpdoloop_enddoloop_beginsave_stack_blockallocate_stackcheck_stacknonlocal_gotononlocal_goto_receiverexception_receiverbuiltin_setjmp_setupbuiltin_setjmp_receiverbuiltin_longjmpeh_returnprologueepiloguesibcall_epiloguetrapconditional_trapprefetchmemory_barrier

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

74

Uses for the machine description1. RTL expansion

define_expand and define_insn with standard namesones with unknown names are ignoredif a needed name isn’t given, GCC crashes

2. RTL adjustmentdefine_split, define_peephole

3. Hard register allocation (reloading)register description, preferencing

4. RTL template matchingdefine_insn (name ignored)assembly generated

optimization passes

optimization passes

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

75

RTL optimization passes (gcc/passes.c)• RTL expansion• CFG cleanup/“jump optimization”• EH landing pad injection• Local CSE• Global CSE• Loop CSE/unroll/peel/unswitch• Jump bypassing• Branch prob. instrumentation/analysis• If conversion (conditional moves etc.)• Tracer (trail dup for superblock

formation)• Web construction (split pseudoreg’s)• Pseudoregister liveness analysis (DSE,

auto-inc/dec addressing)• Instruction combination• Partition basic blocks (hot and cold)

• Register movement (avoid rt moves)• Optimize mode switching• Modulo scheduling (loop pipelining)• Instruction scheduling• Register allocation (reloading)• Optimize stack operations• Peephole optimizations

(define_peephole)• Basic block reordering (profile-driven)• Variable tracking (debug support)• Delayed branch scheduling• Branch shortening• Register-to-stack conversion• Final• Debugging information dump

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

The GCC Back-End

2008年8月12日星期二

76

Front-endFront-end Middle-endMiddle-end Back-endBack-end

Register allocation

Instruction selection

Debugger support

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

77

Register allocation

• RTL pseudo-registers hard registers

• Proceeds in several passes1. Register class scan (preference registers)2. Register allocation within basic blocks3. Register allocation for remaining registers4. Reload (renumbering, spilling)

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

78

Instruction selection(define_insn "jump"[(set (pc)

(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])

(define_insn "jump"[(set (pc)

(label_ref (match_operand 0 "" "")))]"""jmp\t%l0"[..attributes...])

(define_insn name RTL condition output attributes)matches RTL (if applicable)

(match_operand[:mode] N predicate constraint)tries to match jump operand N of mode if predicate

jmp\t%l0Generates x86 jmp with %l0 (operand 0 as label)

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

79The GCC Back-End

A machine description tour

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

80

Debugger support• Specifying –g to the compiler inserts debugging

symbols in the assembly output• DWARF2 format

embedded within ELFa tree of debug info entries (compilation unit at the root)

• each with a linked list of attributes

DWARF2 manual: ftp.freestandards.org/pub/dwarf/dwarf-2.0.0.pdf

• Once assembled, “readelf –w” interprets them

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

Runtime Issues

2008年8月12日星期二

81

Object layout

Virtual method lookup

The Boehm garbage collector

crt stuff

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

82

class A {public:

int x;virtual void myMethod();virtual void other();

};

class B : public A {public:

int y;virtual void myMethod();virtual void third();

};

class A {public:

int x;virtual void myMethod();virtual void other();

};

class B : public A {public:

int y;virtual void myMethod();virtual void third();

};

Simple object layout (C++)

vtablevtable

xx

vtablevtable

xx

yy

A::myMethodA::myMethod

A::otherA::other

B::myMethodB::myMethod

A::otherA::other

vtable for B

vtable for A instances of A

instances of B

B::thirdB::third

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

83

Simple object layout (C++)

vtablevtable

xx

vtablevtable

xx

yy

A::myMethodA::myMethod

A::otherA::other

B::myMethodB::myMethod

A::otherA::other

vtable for B

vtable for A instances of A

instances of B

subo

bjec

t A o

f B

B::thirdB::thirdsub-

vtab

le A

of B

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

84

class B pointerclass B pointer

GC descriptorGC descriptor

finalizefinalize

hashCodehashCode

Object layout (Java)

vtablevtable

xx

yyequalsequals

toStringtoStringinstances of B

subo

bjec

t A o

f B

cloneclone

sub-

vtab

le O

bjec

t of B

myMethodmyMethod

otherother

vtab

le fo

r B

thirdthird

sub-

vtab

le A

of B

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

85

But more complicated for C++ !

xx

xx

yy

instances of A

instances of B

subo

bjec

t A o

f B

class A {public:

int x;void myMethod();void other();

};

class B : public A {public:

int y;void myMethod();void third();

};

class A {public:

int x;void myMethod();void other();

};

class B : public A {public:

int y;void myMethod();void third();

};

First, classes might not have virtual functions !First, classes might not have virtual functions !

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

86

But more complicated for C++ !class A {public:

int x;virtual void one();

};

class B {public:

int y;virtual void two();

};

class C : public A, public B {public:

int z;virtual void three();

};

class A {public:

int x;virtual void one();

};

class B {public:

int y;virtual void two();

};

class C : public A, public B {public:

int z;virtual void three();

};

Second, classes might have multiple bases !Second, classes might have multiple bases !

vtablevtable

xxA::oneA::one

vtable for A

instances of A

vtablevtable

yyB::twoB::two

vtable for B

instances of B

vtable for C instances of C

?? ??

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

87

Object layout for multiple bases

vtablevtable

xx

A::oneA::one

vtable for A

instances of A

vtablevtable

yy

B::twoB::two

vtable for B

instances of B

instances of C

vtablevtable

xx

vtablevtable

yy

zz

vtable for C

A::oneA::one

——

——

B::twoB::two

C::threeC::three

subo

bjec

t A o

f Csu

bobj

ect B

of C

Requires “this pointer-adjustment”Requires “this pointer-adjustment”

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

88

Multiple bases, cont’d

instances of C

vtablevtable

xx

vtablevtable

yy

zz

vtable for C

A::oneA::one

[ offset = – 4 ][ offset = – 4 ]

——

B::twoB::two

C::threeC::three

subo

bjec

t A o

f Csu

bobj

ect B

of C

But what about dynamic_cast ?!But what about dynamic_cast ?!

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

89

[ offset = 0 ][ offset = 0 ]

——

Multiple bases, cont’d

instances of C

vtablevtable

xx

vtablevtable

yy

zz

vtable for C

A::oneA::one

[ offset = – 4 ][ offset = – 4 ]

——

B::twoB::two

C::threeC::three

subo

bjec

t A o

f Csu

bobj

ect B

of C

Top-level offsetTop-level offset

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

90

[ offset = 0 ][ offset = 0 ]

ptr. typeinfo Cptr. typeinfo C

Multiple bases, finished*

instances of C

vtablevtable

xx

vtablevtable

yy

zz

vtable for C

A::oneA::one

[ offset = – 4 ][ offset = – 4 ]

ptr. typeinfo Cptr. typeinfo C

B::twoB::two

C::threeC::three

subo

bjec

t A o

f Csu

bobj

ect B

of C

But what about C++ type info ?!But what about C++ type info ?!

* there are further complications, but we’ll leave it here* there are further complications, but we’ll leave it here

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

91

Java and C++ share object layout

vtab

le fo

r (Ja

va) B

[ offset = 0 ][ offset = 0 ]

null typeinfonull typeinfo

class B pointerclass B pointer

GC descriptorGC descriptor

finalizefinalize

hashCodehashCode

equalsequals

toStringtoString

cloneclone

myMethodmyMethod

otherother

thirdthird

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

92

Virtual method lookup (C++, Java)vt

able

for (

Java

) B

[ offset = 0 ][ offset = 0 ]

null typeinfonull typeinfo

class B pointerclass B pointer

GC descriptorGC descriptor

finalizefinalize

hashCodehashCode

equalsequals

toStringtoString

cloneclone

myMethodmyMethod

otherother

thirdthird

Now, virtual methodinvocation is a snap !Now, virtual methodinvocation is a snap !

Compiler knows methodoffset within vtable

Compiler knows methodoffset within vtable

So it generates an indirectaccess through instance pointer…

So it generates an indirectaccess through instance pointer…

…and invokes the methodthrough the pointer found in vtable

…and invokes the methodthrough the pointer found in vtable

vtablevtable

xx

yy

instance of B

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

93

The Boehm garbage collector

• Conservative mark & sweep garbage collectordesigned to operate in a hostile environment as a drop-in replacement for malloc“conservative” means it cannot distinguish between pointers and non-pointersJava is considerably less “hostile” than C/C++

• can’t hide pointers from the compiler

Boehm, H., Space Efficient Conservative Garbage Collection. In ACM PLDI’91.

Boehm, H., Space Efficient Conservative Garbage Collection. In ACM PLDI’91.

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

94

• Java front-end generates class pointer masksstows them in vtablecomputed in gcc/java/boehm.c

• Class too big for a pointer mask ?use a count of reference fieldsuse a “mark procedure”

• Where to lookboehm-gc/doc contains docslibjava/prims.cc contains GC-aware allocation routines

[ offset = 0 ][ offset = 0 ]

null typeinfonull typeinfo

class B pointerclass B pointer

GC descriptorGC descriptor

Java and Boehm GC

finalize

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Runtime Issues 2008年8月12日星期二

95

crt stuff (“C runtime”)

• crt1.o, crti.o, crtn.o* provided by glibccrt1.o sets up libc before main() is even invokedcrti.o prologue for .init and .finicrtn.o epilogue for .init and .fini

• crtbegin.o, crtend.o* provided by GCCcrtbegin.o contributes frame_dummy() call to .init;

calls static data destructors in .finicrtend.o calls static data constructors in .init

code in gcc/crtstuff.c* and some variations* and some variations

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

Miscellany

2008年8月12日星期二

96

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Miscellany 2008年8月12日星期二

97

Tips on modifying code

• Find something similarbrowse source code

• build off it• use a debugger

-fdump-tree-*-d*debug_tree/browse_tree

PLDI 2008Tuscon, Arizona

Morgan Deters and Ron CytronDemystifying GCC

Demyst

ifying G

CC

Wrap-up

2008年8月12日星期二

98

Running GCC under GDB

Obtaining development versions of GCC

Reporting bugs in GCC

What’s next for GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二

99

Running GCC under GDB

• Inevitably, hacking a compiler will result insegfaultassertion faultincorrect code generation

• Remember to attach debugger to the compiler,not the driver

• “gcc –v …,” then use GDB on the actual front-end

Morgan Deters and Ron CytronDemystifying GCC

In Greater Depth

OOPSLA 2006 (revised)Portland, Oregon 2008年8月12日星期二

100Wrap-up

Debugging GCC

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二

101

Obtaining development versions of GCC

• All GCC development is in the opendesign discussionschange logsbugs

• Subversion (SVN) repositorypublic read accessfor details: gcc.gnu.org/svn.htmlclients available from subversion.tigris.org/

Morgan Deters and Ron CytronDemystifying GCC

PLDI 2008Tuscon, Arizona Wrap-up 2008年8月12日星期二

102

What to do if you find a bug in GCC

• Check to see if bug is present in SVN version• Check to see if bug is in bug database

http://gcc.gnu.org/bugzilla/• Collect version information (gcc --version)

• Guidelines: http://gcc.gnu.org/bugs.html• Report it: http://gcc.gnu.org/bugzilla/

Demystifying GCC:Under the Hood of the GNU Compiler Collection

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron

PLDI 2008

June 2008

Cop

yrig

ht ©

2005

–200

8 M

orga

n D

eter

s and

Ron

Cyt

ron

Thanks!Demystifying GCC

Distributed Object Computing LaboratoryWashington University

St. Louis, Missouri

Programming Logic GroupTechnical University of Catalonia

Barcelona, Spain

Morgan [email protected]

Ron [email protected]