Proteomics of Diatoms: Discovery of Polyamine ... - CORE

201
PROTEOMICS OF DIATOMS: DISCOVERY OF POLYAMINE MODIFICATIONS IN BIOSILICA-ASSOCIATED PROTEINS DISSERTATION zur Erlangung des akademischen Grades Doctor of Philosophy (Ph. D.) vorgelegt dem Bereich Mathematik und Naturwissenschaften der Technischen Universität Dresden von M. Sc. Alexander Milentyev geboren am 12. Februar 1988 in Leninsk, Kazakhstan. Eingereicht am 1. Juli 2018 Die Dissertation wurde in der Zeit von 6. Januar 2014 bis 6. Januar 2018 im Max-Planck-Institut für molekulare Zellbiologie und Genetik angefertigt.

Transcript of Proteomics of Diatoms: Discovery of Polyamine ... - CORE

PROTEOMICS OF DIATOMS: DISCOVERY OF POLYAMINE

MODIFICATIONS IN BIOSILICA-ASSOCIATED PROTEINS

D I S S E R TAT I O N

zur Erlangung des akademischen Grades

Doctor of Philosophy

(Ph. D.)

vorgelegt

dem Bereich Mathematik und Naturwissenschaften

der Technischen Universität Dresden

von

M. Sc. Alexander Milentyev

geboren am 12. Februar 1988 in Leninsk, Kazakhstan.

Eingereicht am 1. Juli 2018

Die Dissertation wurde in der Zeit von 6. Januar 2014 bis 6. Januar 2018

im Max-Planck-Institut für molekulare Zellbiologie und Genetik angefertigt.

ס ו ת M M

י פ| ק נ| נ|

|N ר ל א א

Daniel 5:26-28

S U M M A R Y

Diatoms are eukaryotic unicellular algae that employ highly specialized proteins called

silaffins for making nanopatterned silica-based cell walls. These proteins share little

or no homology across diatom species and are extensively post-translationally modi-

fied. Apart from conventional modifications (e. g., phosphorylation and glycosylation)

lysine residues of silaffins bear polyamine chains with highly heterogeneous molecu-

lar structure. The latter appear to be specific for silicifying organisms and therefore

hypothesized to play a key role in biosilica synthesis. However, polyamine modifica-

tions of lysines, modified proteins, and modification sites remain poorly characterized.

To address these questions, we developed a method to quantify polyamines and iden-

tify sites of polyamine modifications in proteins from phylogenetically closely related,

yet morphologically distinct diatoms Thalassiosira pseudonana, T. oceanica, and Cyclotella

cryptica. We demonstrated that the overall pattern of polyamines followed the phyloge-

netic proximity across these diatom species and showed that polyamine modifications

occurred at consensus sites even in proteins showing no sequence similarity.

Consensus sites

Modified proteins

Modified peptidesT. oceanica

C. crypca

T. pseudonana

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

16

1 (

QA

C)

17

5 (

QA

C)

18

9 (

QA

C)

20

4 (

QA

C)

21

8 (

QA

C)

23

2 (

QA

C)

24

6 (

QA

C)

26

1 (

QA

C)

27

5 (

QA

C)

28

9 (

QA

C)

30

3 (

QA

C)

30

3 (

QA

C)

31

7 (

QA

C)

31

7 (

QA

C)

33

1 (

QA

C)

33

1 (

QA

C)

16

3 (

QA

C)

20

5 (

QA

C)

24

8 (

QA

C)

31

9 (

QA

C)

33

3 (

QA

C)

34

7 (

QA

C)

39

9 (

QA

C)

41

3 (

QA

C)

42

7 (

QA

C)

143.1543

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300

m/z

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

220000

240000

260000

280000

300000

320000

340000

360000

380000

Rela

tive A

bundance

187.1076

R=20906

z=1

705.3769

R=10606

z=2

834.5221

R=9806

z=1

215.1025

R=19206

z=1

143.1542

R=23906

z=1

965.5628

R=9106

z=1231.0975

R=18506

z=1616.4504

R=11306

z=1

747.4897

R=10306

z=1

427.1820

R=13606

z=1

316.1500

R=15806

z=1

1094.6050

R=8406

z=11195.6550

R=8006

z=1

545.4136

R=12006

z=11324.6517

R=7406

z=1

789.4678

R=9406

z=1

921.4961

R=8406

z=1

1135.5883

R=7606

z=1

362.1371

R=14006

z=1

691.3907

R=8706

z=1

1076.6036

R=7606

z=1

863.4427

R=8406

z=1

992.5208

R=7506

z=1

y10

y9

y8

y7

y6

y5

y4

y3

y2

143.1542

a2

b2

b3

b4 b9

Modified lysines

KSEDAAAVDAKASKESHMSISMSISGDMSMAKSHKAEAEDVTAEDVTEMSMAKAGKDEASTEDSTEDMCMPFAKSDKEMSVKSKFAKSDKEMSVKSKQGKTEMSVKSDKEMSVKSKQGKTEMSVADKEMSVKSKQGKTEMSVADAKA

0

1

2

3

4

bits

M

Y

E

K

DAS

T

GD

ESA

S

K

M

E

D

G

P

K

E

D

VAS

V

S

D

E

G

M

T

E

M

D

SA

R

G

M

L

ATS

HEDGP

SKV

G

S

M

EDA

E

D

V

T

P

GSAKQPLVTSATQAHGEVPMGAKL

E

V

S

A

T

A

S

E

P

K

M

E

G

SPA

V

Q

M

E

A

GSD

T

M

D

V

K

G

AS

Y

V

K

I

S

E

A

MG

KDPAS

EGHGGDHSISMSMHSSKAEKQAIEAAVEED

VAGPAKAAKLFKPKASKAGSMPDEAGAKSA

KMSMDTKSGKSEDAAAVDAKASKESHMSIS

GDMSMAKSHKAEAE DVTEMSMAKAGK DE

ASTEDMCMPFAKSDKEMSVKSKQGKTEMSV

ADAKASKESSMPSSKAAKIFKGKSGKSGSL

SMLKSEKASSAHSLSMPKAEKVHSMSA

Diatom Biosilica

H3N+

HOOC

D-V-T-E-M-S-M-A-K-A-G-Kb1 b2 b3 b4 b5 b6 b7 b8 b10 b11

y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1

86.0964

1324.6517

1267.6076

143.1542

b9

iii

Z U S A M M E N FA S S U N G

Kieselalgen (Diatomee) sind eukaryotische einzellige Algen die hochspezifische Prote-

ine (sogenannte Silaffine) erzeugen, um ‘nanopatterned’ Silica-Zellwände herzustellen.

Diese Proteine zeigen geringe oder gar keine Homologie innerhalb der Diatomeen Gat-

tung und sind ausgiebig (extensiv) posttranslatorisch modifiziert. Zum Unterschied zu

konventioneller Modifikation (z.B. Phosphorylierung und Glykosylierung) weisen Ly-

sinreste von Silaffinen einige Polyaminketten mit sehr heterogenen molekularen Struk-

turen auf. Diese Modifikationen sind spezifisch für Kieselalgen und spielen somit hypo-

thetisch eine Rolle in der Biosilica-Synthese. Allerdings sind Lysin Polyamin Modifika-

tionen, modifizierte Proteine und modifizierte Stellen kaum charakterisiert. Um diese

Frage zu beantworten entwickelten wir eine Methode Polyamine zu quantifizieren und

die Position von Polyamin-Modifikationen in engverwandte Proteine zu identifizieren

(in morphologisch unterschiedliche Diatomeen Thalassiosira pseudonana, T. oceanica und

Cyclotella cryptica). Wir zeigten, dass das Gesamtmuster von Polyaminender phylogene-

tischen Nähe dieser Kieselalgenarten folgt und dass diese Polyaminmodifikationen an

Konsensusstellen sogar in Proteinen auftraten, die keine Sequenzähnlichkeit zeigten.

Diatomeen-BiosilicaKonsensusstellen

Modifizierte Proteine

Modifizierte PeptideT. oceanica

C. crypca

T. pseudonana

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

16

1 (

QA

C)

17

5 (

QA

C)

18

9 (

QA

C)

20

4 (

QA

C)

21

8 (

QA

C)

23

2 (

QA

C)

24

6 (

QA

C)

26

1 (

QA

C)

27

5 (

QA

C)

28

9 (

QA

C)

30

3 (

QA

C)

30

3 (

QA

C)

31

7 (

QA

C)

31

7 (

QA

C)

33

1 (

QA

C)

33

1 (

QA

C)

16

3 (

QA

C)

20

5 (

QA

C)

24

8 (

QA

C)

31

9 (

QA

C)

33

3 (

QA

C)

34

7 (

QA

C)

39

9 (

QA

C)

41

3 (

QA

C)

42

7 (

QA

C)

143.1543

100 200 300 400 500 600 700 800 900 1000 1100 1200 1300

m/z

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

220000

240000

260000

280000

300000

320000

340000

360000

380000

Rela

tive A

bundance

705.3769

R=10606

z=2

834.5221

R=9806

z=1

215.1025

R=19206

z=1

143.1542

R=23906

z=1

965.5628

R=9106

z=1231.0975

R=18506

z=1616.4504

R=11306

z=1

747.4897

R=10306

z=1

427.1820

R=13606

z=1

316.1500

R=15806

z=1

1094.6050

R=8406

z=11195.6550

R=8006

z=1

545.4136

R=12006

z=11324.6517

R=7406

z=1

789.4678

R=9406

z=1

921.4961

R=8406

z=1

1135.5883

R=7606

z=1

362.1371

R=14006

z=1

691.3907

R=8706

z=1

1076.6036

R=7606

z=1

863.4427

R=8406

z=1

992.5208

R=7506

z=1

y10

y9

y8

y7

y6

y5

y4

y3

y2

143.1542

b2

b3

b4 b9

KSEDAAAVDAKASKESHMSISMSISGDMSMAKSHKAEAEDVTAEDVTEMSMAKAGKDEASTEDSTEDMCMPFAKSDKEMSVKSKFAKSDKEMSVKSKQGKTEMSVKSDKEMSVKSKQGKTEMSVADKEMSVKSKQGKTEMSVADAKA

0

1

2

3

4

bits

M

Y

E

K

DAS

T

GD

ESA

S

K

M

E

D

G

P

K

E

D

VAS

V

S

D

E

G

M

T

E

M

D

SA

R

G

M

L

ATS

HEDGP

SKV

G

S

M

EDA

E

D

V

T

P

GSAKQPLVTSATQAHGEVPMGAKL

E

V

S

A

T

A

S

E

P

K

M

E

G

SPA

V

Q

M

E

A

GSD

T

M

D

V

K

G

AS

Y

V

K

I

S

E

A

MG

KDPAS

EGHGGDHSISMSMHSSKAEKQAIEAAVEED

VAGPAKAAKLFKPKASKAGSMPDEAGAKSA

KMSMDTKSGKSEDAAAVDAKASKESHMSIS

GDMSMAKSHKAEAE DVTEMSMAKAGK DE

ASTEDMCMPFAKSDKEMSVKSKQGKTEMSV

ADAKASKESSMPSSKAAKIFKGKSGKSGSL

SMLKSEKASSAHSLSMPKAEKVHSMSA

H3N+

HOOC

D-V-T-E-M-S-M-A-K-A-G-Kb1 b2 b3 b4 b5 b6 b7 b8 b10 b11

y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1

86.0964

1324.6517

1267.6076

143.1542

b9Lysin Modifikationen

v

C O N T E N T S

summary ii

zusammenfassung iv

list of figures x

list of tables xiv

abbreviations xvi

1 introduction 1

1.1 Diatoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Diatom biosilica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Biosilicification in nature . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Diatom biosilica structure and cell cycle . . . . . . . . . . . . . . . . 5

1.2.3 The cell biology of biosilica morphogenesis . . . . . . . . . . . . . . 7

1.3 The role of polyamine PTMs in diatom biosilicification . . . . . . . . . . . 8

1.3.1 Identifying biomolecules associated with diatom biosilica . . . . . 9

1.3.2 PTM complexity of biosilica-associated proteins . . . . . . . . . . . 12

1.3.3 Lysine ε-polyamine PTMs in biosilica-associated proteins . . . . . . 15

1.4 Mass spectrometry in PTM discovery . . . . . . . . . . . . . . . . . . . . . 20

1.4.1 Modification-specific proteomics . . . . . . . . . . . . . . . . . . . . 20

1.4.2 Analysis of polyamine-modified lysines by MS . . . . . . . . . . . . 22

1.4.3 Fractionation of proteins and peptides prior to MS . . . . . . . . . 24

1.4.4 MS/MS analysis in modification-specific proteomics . . . . . . . . 25

1.4.5 Bioinformatics tools for modification-specific proteomics . . . . . . 30

1.5 Rationale of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 aim of the thesis 35

vii

viii contents

3 results and discussion 37

3.1 A method for analysis of ε-polyamine PTMs . . . . . . . . . . . . . . . . . 38

3.1.1 Establishing a method to analyse ε-polyamines . . . . . . . . . . . . 38

3.1.2 Method applicability for lysine PTM profiling . . . . . . . . . . . . 40

3.1.3 Profiling of lysine PTMs in silaffin-3 . . . . . . . . . . . . . . . . . . 43

3.2 Profiling lysine PTMs in biosilica extracts . . . . . . . . . . . . . . . . . . . 46

3.2.1 Lysine PTM profile and characteristic fragments . . . . . . . . . . . 47

3.2.2 Elucidation of phosphopolyamine structures . . . . . . . . . . . . . 59

3.2.3 Lysine PTM profiles of AFSM extracts . . . . . . . . . . . . . . . . . 62

3.2.4 Comparison of AFIM and AFSM profiles in T. pseudonana . . . . . 65

3.2.5 Phylogenetic relationship across three diatom species . . . . . . . . 67

3.3 PTM localization and discovery of consensus motifs . . . . . . . . . . . . . 72

3.3.1 Multiple protease strategy for mapping lysine PTMs . . . . . . . . 72

3.3.2 Selection of deprotection technique . . . . . . . . . . . . . . . . . . 74

3.3.3 Mapping lysine PTMs on tpSil3 using iterative search strategy . . 77

3.3.4 Deconvolution of raw MS/MS spectra . . . . . . . . . . . . . . . . . 80

3.3.5 PTM mapping by polyamine-specific fragments . . . . . . . . . . . 83

3.3.6 Identification of consensus motifs harboring lysine PTMs . . . . . 85

4 conclusions and outlook 97

5 materials and methods 101

5.1 Synthesis of polyamine standards . . . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Isolation of biosilica-associated proteins . . . . . . . . . . . . . . . . . . . . 105

5.3 Expression of tpSil3 from synthetic gene . . . . . . . . . . . . . . . . . . . . 107

5.4 HCl hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.5 AQC-derivatization of amino acids and polyamines . . . . . . . . . . . . . 108

5.6 LC-MS/MS analysis of QAC-derivatives . . . . . . . . . . . . . . . . . . . . 108

5.7 Amino acid measurement using UV-detection . . . . . . . . . . . . . . . . 109

5.8 Direct infusion MS/MS analysis . . . . . . . . . . . . . . . . . . . . . . . . 110

5.9 Acetylation of phosphopolyamines . . . . . . . . . . . . . . . . . . . . . . . 110

5.10 31P NMR measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.11 Deglycosylation with TFMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

contents ix

5.12 Treatment with HF·pyridine soluble complex . . . . . . . . . . . . . . . . . 111

5.13 Anhydrous HF-treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.14 Protein analysis by GeLC-MS/MS . . . . . . . . . . . . . . . . . . . . . . . 112

5.15 Proteomics data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

a appendix 117

a.1 Analytical data for synthetic standards . . . . . . . . . . . . . . . . . . . . 121

a.2 XICs of QAC-derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

b bibliography 163

acknowledgments 175

publications 177

declaration / erklärung 179

L I S T O F F I G U R E S

Figure 1.1 Images of the cell walls of 5 diatoms . . . . . . . . . . . . . . . . . 3

Figure 1.2 Structure of the diatom frustule and cell cycle. . . . . . . . . . . . 6

Figure 1.3 Hypothetical mechanism for catalysis of silicic acid condensation. 12

Figure 1.4 PTM complexity of silaffins . . . . . . . . . . . . . . . . . . . . . . 13

Figure 1.5 Chemical structures of modified lysine residues. . . . . . . . . . . 16

Figure 1.6 Proteomics approaches . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 1.7 AQC derivatization chemistry . . . . . . . . . . . . . . . . . . . . . 23

Figure 1.8 Schematic view of the LTQ Orbitrap Velos. . . . . . . . . . . . . . 26

Figure 1.9 Principles of DDA and peptide fragmentation. . . . . . . . . . . . 28

Figure 1.9 Phylogenetic tree and SEM images of three diatoms . . . . . . . . 32

Figure 3.1 Generic structure of lysine PTMs. . . . . . . . . . . . . . . . . . . . 39

Figure 3.2 Calibration curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 3.3 mass spectrometry (MS)-spectrum of acidic hydrolysate of tpSil3 44

Figure 3.4 AA content and lysine PTMs profile of tpSil3 . . . . . . . . . . . . 45

Figure 3.5 Schematic diagram of ε-polyamine fragmentation. . . . . . . . . . 48

Figure 3.6 HCD MS/MS of synthetic standards. . . . . . . . . . . . . . . . . . 49

Figure 3.7 Fragment spectra of isomeric lysine derivatives PTM 303 . . . . . 51

Figure 3.7 Fragment spectra of isomeric lysine derivatives PTM 303 (con-

tinued from previous page) . . . . . . . . . . . . . . . . . . . . . . 52

Figure 3.8 Phosphopolyamine tandem mass spectrometry (MS/MS) spectra 60

Figure 3.9 31P-NMR spectrum of T. pseudonana biosilica hydrolysate. . . . . . 61

Figure 3.10 Full lysine PTM profiles of AFSM biosilica extracts. . . . . . . . . 64

Figure 3.11 Venn diagram and phylogenetic tree . . . . . . . . . . . . . . . . . 66

Figure 3.12 Hypothetical routes for lysine modifications. . . . . . . . . . . . . 69

Figure 3.13 Coverage for: (a) native tpSil3; (b) tpSil3 expressed in E. coli. . . . 73

Figure 3.14 Peptide coverage obtained for tpSil3 treated with: TFMS, HF·pyridine

complex, anhydrous HF. . . . . . . . . . . . . . . . . . . . . . . . . 75

xi

xii list of figures

Figure 3.15 Gel images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Figure 3.16 Silaffin mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Figure 3.17 Deconvolution of MS/MS spectra. . . . . . . . . . . . . . . . . . . 81

Figure 3.17 Deconvolution of MS/MS spectra. . . . . . . . . . . . . . . . . . . 82

Figure 3.18 MS/MS spectra of modified peptides with characteristic ions . . 84

Figure 3.19 MS/MS spectra of modified peptides with characteristic ions . . 86

Figure 3.20 Silaffin mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 3.21 Graphical representations of the local protein contexts of modi-

fied lysines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Figure 3.21 Graphical representations of the local protein contexts of modi-

fied lysines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Figure 3.22 Sequence logos of local protein contexts of PTM sites separately

for each diatom species . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 3.23 Local protein contexts of modified lysines in KXXK motifs . . . . . 96

Figure 4.1 Mapped PTMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Figure 5.1 Chemical structures and synthesis of internal standards . . . . . . 104

Figure 5.2 Sequence design of tpSil3 expressed from a synthetic gene. . . . . 107

Figure A.1 Reactions of AQC, which might occur in buffered aqueous solu-

tions and/or during storage . . . . . . . . . . . . . . . . . . . . . . 118

Figure A.2 Calibration curves for amino acids . . . . . . . . . . . . . . . . . . 119

Figure A.3 Number of amino acid residues. Experimental and theoretical

amino acid content of tpSil3. . . . . . . . . . . . . . . . . . . . . . . 120

Figure A.4 Sequences of biosilica-associated proteins. . . . . . . . . . . . . . . 122

Figure A.5 XICs of phosphopolyamines. . . . . . . . . . . . . . . . . . . . . . . 123

Figure A.6 Full lysine PTM profiles of AFSM biosilica extracts. . . . . . . . . 124

Figure A.7 XICs of QAC-derivatized lysine derivatives . . . . . . . . . . . . . 125

Figure A.8 Fragment spectra of ornithine derivative PTM 275-orn (internal

standard) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Figure A.9 Fragment spectra of lysine derivative m/z 161 . . . . . . . . . . . . 127

Figure A.10 Fragment spectra of lysine derivative m/z 175 . . . . . . . . . . . . 128

Figure A.11 Fragment spectra of lysine derivative m/z 189 . . . . . . . . . . . . 129

Figure A.12 Fragment spectra of lysine derivative m/z 232 . . . . . . . . . . . . 130

Figure A.13 Fragment spectra of lysine derivative m/z 275 . . . . . . . . . . . . 131

list of figures xiii

Figure A.14 Fragment spectra of lysine derivative m/z 289 . . . . . . . . . . . . 132

Figure A.15 Fragment spectra of lysine derivative m/z 317a . . . . . . . . . . . 133

Figure A.16 Fragment spectra of lysine derivative m/z 317b . . . . . . . . . . . 134

Figure A.17 Fragment spectra of lysine derivative m/z 331a . . . . . . . . . . . 135

Figure A.18 Fragment spectra of lysine derivative m/z 331b . . . . . . . . . . . 136

Figure A.19 Fragment spectra of lysine derivative m/z 205 . . . . . . . . . . . . 137

Figure A.20 Fragment spectra of lysine derivative m/z 319 . . . . . . . . . . . . 138

Figure A.21 Fragment spectra of lysine derivative m/z 333 . . . . . . . . . . . . 139

Figure A.22 Fragment spectra of lysine derivative m/z 347 . . . . . . . . . . . . 140

Figure A.23 Fragment spectra of lysine derivative m/z 399 . . . . . . . . . . . . 141

Figure A.24 Fragment spectra of lysine derivative m/z 413 . . . . . . . . . . . . 142

Figure A.25 Fragment spectra of lysine derivative m/z 427 . . . . . . . . . . . . 143

L I S T O F TA B L E S

Table 1.1 Overview of silaffin PTMs. . . . . . . . . . . . . . . . . . . . . . . . 19

Table 3.1 Calculated m/z values for ε-polyaminated lysines. . . . . . . . . . 39

Table 3.2 Catalogue of lysine polyamine modifications and their charac-

teristic fragments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Table 3.3 Tabular representation of the data from Fig. 3.10. . . . . . . . . . . 70

Table 5.1 (a) chemicals and reagents. . . . . . . . . . . . . . . . . . . . . . . . 102

Table 5.1 (b) materials and (c) instrumentation. . . . . . . . . . . . . . . . . . 103

Table 5.2 HPLC gradient used for the analysis of QAC-derivatives. . . . . . 109

Table 5.3 Cleavage specificity of the proteases used in the thesis. . . . . . . 113

Table 5.4 HPLC gradient used for the analysis of peptides. . . . . . . . . . . 113

Table 5.5 Mascot search parameters. . . . . . . . . . . . . . . . . . . . . . . . 115

Table A.1 Calculated N×QAC-derivatization groups for ε-polyamines. . . . 119

Table A.2 Sequences of identified post-translationally modified proteins . . 144

Table A.3 Contingency tables for Fisher’s exact test . . . . . . . . . . . . . . 161

xv

A B B R E V I AT I O N S

AAA amino acid analysis

ACN acetonitrile

AGC automatic gain control

AFIM ammonium fluoride insoluble material

AFSM ammonium fluoride soluble material

AIF all-ion fragmentation

AMQ 6-aminoquinoline

AQC 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate

BAP biosilica-associated protein

BLAST Basic Local Alignment Search Tool

BSA bovine serum albumin

CID collision-induced dissociation

CIAP calf intestinal alkaline phosphatase

CTC chlorotrityl chloride

DAD diode array detector

DBU 1,8-diazabicyclo[5.4.0]undec-7-ene

DCM dichloromethane

DIAD diisopropyl azodicarboxylate

DIPEA N,N-diisopropylethylamine

DMF dimethylformamide

DTT dithiothreitol

DDA data-dependent acquisition

xvii

xviii abbreviations

DNA deoxyribonucleic acid

cDNA complementary DNA

DMSO dimethyl sulfoxide

EDTA ethylenediamine tetraacetate

ER endoplasmic reticulum

ESI electrospray ionization

ESAW enriched artificial seawater

ETD electron-transfer dissociation

ECD electron-capture dissociation

FA formic acid

FDR false discovery rate

FT MS Fourier transform mass spectrometry

FWHM full width at half maximum

GeLC-MS/MS gel electrophoresis liquid chromatography tandem mass spectrometry

GFP green fluorescent protein

GO gene ontology

HCD higher-energy collisional dissociation

HILIC hydrophilic interaction chromatography

HPLC high-performance liquid chromatography

HRMS high resolution mass spectrometry

HSQC N-hydroxysuccinimidyl 6-quinolinyl carbamate

HSQC heteronuclear single quantum coherence spectroscopy

IM immonium ion

IPTG isopropyl β-d-1-thiogalactopyranoside

IT ion trap

IAA iodoacetamide

LCPA long-chain polyamine

abbreviations xix

LC liquid chromatography

LC-MS/MS liquid chromatography coupled with tandem mass spectrometry

LTQ Linear Trap Quadropole

MRM multiple reaction monitoring

MS mass spectrometry

MS1 full scan

MS2 MS/MS scan

MS/MS tandem mass spectrometry

MW molecular weight

MWCO molecular weight cut-off

NHS N-hydroxysuccinimide

NMR nuclear magnetic resonance

nCE normalized collision energy

PBS phosphate-buffered saline

PCR polymerase chain reaction

PMSF phenylmethylsulfonyl fluoride

PSM peptide-spectrum match

PST peptide sequence tag

PTM post-translational modification

QAC 6-quinolinylaminocarbonyl

RPLC reversed-phase liquid chromatography

RT room temperature

SDS sodium dodecyl sulfate

SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis

SDV silica deposition vesicle

SEM scanning electron microscope

SFLP silaffin-like protein

xx abbreviations

SIT silicic acid transporter protein

natSil1A silaffin-1A from C. fusiformis

natSil2 silaffin-2 from C. fusiformis

tpSil1/2 silaffin-1/2 from T. pseudonana

tpSil3 silaffin-3 from T. pseudonana

tpSil4 silaffin-4 from T. pseudonana

TBAI tetrabutylammonium iodide

TEA triethylamine

THF tetrahydrofuran

TFMS trifluoromethanesulfonic acid

TFA trifluoroacetic acid

TIC total ion chromatogram

TOCSY two-dimensional nuclear magnetic resonance spectroscopy

UPLC ultra performance liquid chromatography

UV ultraviolet

XIC extracted-ion-chromatogram

1 I N T R O D U C T I O N

Contents1.1 Diatoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Diatom biosilica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Biosilicification in nature . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Diatom biosilica structure and cell cycle . . . . . . . . . . . . . . 5

1.2.3 The cell biology of biosilica morphogenesis . . . . . . . . . . . . 7

1.3 The role of polyamine PTMs in diatom biosilicification . . . . . . . . . . 8

1.3.1 Identifying biomolecules associated with diatom biosilica . . . 9

1.3.2 PTM complexity of biosilica-associated proteins . . . . . . . . . 12

1.3.3 Lysine ε-polyamine PTMs in biosilica-associated proteins . . . . 15

1.4 Mass spectrometry in PTM discovery . . . . . . . . . . . . . . . . . . . . 20

1.4.1 Modification-specific proteomics . . . . . . . . . . . . . . . . . . 20

1.4.2 Analysis of polyamine-modified lysines by MS . . . . . . . . . . 22

1.4.3 Fractionation of proteins and peptides prior to MS . . . . . . . 24

1.4.4 MS/MS analysis in modification-specific proteomics . . . . . . 25

1.4.5 Bioinformatics tools for modification-specific proteomics . . . . 30

1.5 Rationale of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1

2 introduction

1.1 diatoms

Diatoms are unicellular, eukaryotic, photosynthetic algae that produce micro- and

nano-scale silicified cell walls [1]. Diatoms occur in almost every aquatic and moist

environment on Earth, inhabiting not only oceans, seas, lakes, and streams, but also

soil and wetlands. These organisms have enormous biogeochemical and ecological

importance, since they are responsible for around one-fifth of the world net primary

production [2–4]. Their ocean-wide dominance is reflected by large marine sea-floor

sediments of silica including diatomaceous earth, most cherts, and a considerable frac-

tion of current fossil fuel reserves [5–8]. According to dating record of these fossils,

diatoms emerged relatively recently in geological time (about 180 mya) [8–10]. Since

then, their diversity exploded into ~250 living diatom genera with more than 200 000

species estimated to exist at the moment, although just half of them have been de-

scribed and classified by unique morphologies [11] (see Fig. 1.1).

The ecological and evolutionary importance of diatoms motivated researchers to

analyse their genomes [12–15]. These diatom sequencing projects shed light on their un-

usual evolutionary history. Whole-genome comparison has revealed incredibly rapid

and wide evolutionary divergence between Thalassiosira pseudonana and Phaeodactylum

tricornutum, that is comparable with those for fishes and mammals [16]. More re-

cent sequencing studies revealed that diatom genomes are highly chimeric and contain

multiple genes acquired through horizontal-transfer events [16]. Diatoms provide an

intriguing example of combining genes from different sources, contributing to many

unusual physiological features that are believed to underlay their evolutionary and eco-

logical success. For instance, diatoms possess an ornithine-urea cycle, which is similar

to that of animals but is absent in other plants [17, 18]. This metabolic coupling seems

to be fundamental for diatom physiology, because it affects the precursors for long-

chain polyamines (LCPAs), which are thought to be directly involved into formation of

ornately patterned biosilica cell walls – the most conspicuous and spectacular feature

of these organisms.

It has been argued that the evolutionary success and an incredible variety of diatoms

is largely due to their ability to build silicified cell walls, which may serve as an armour

protection against phytoplankton predators [20–22] and are energy efficient to produce

1.1 diatoms 3

(a) Actinoptychus senarius (b) Biddulphia antediluviana (c) Pleurosigmaangulatum

(d) Surirella fastuosa (e) Triceratium favum

Figure 1.1 Images of the cell walls of five different diatom species: (a) the circular shape of theradial centric diatom Actinoptychus senarius; (b) the rhomboid shape of the polar centric diatomBiddulphia antediluviana; (c) the rhomboid shape of the pennate diatom Pleurosigma angulatum;(d) the ovoid shape of the pennate diatom Surirella species; (e) the triangular shape of the polarcentric diatom Triceratium favum (images taken from [19]).

4 introduction

comparing to equivalent organic structures [23]. How this evolutionary trade-offs re-

late to fascinating evolutionary success and morphological diversity of diatoms is cur-

rently under debate [24]. It is clear, however, that advanced mechanical properties and

robust biosilica structure is relevant to their adaptation ability, which fits well in the

context of diatom evolutionary efficiency. Indeed, when designing buildings and air-

craft, architects and engineers have applied the same structural principles as diatoms

use to create their minute shells. Nowadays biosilica structures attract increased atten-

tion from a broad array of researchers, ranging from fundamental biologists to applied

material scientists [25, 26]. The present study focuses on the fundamental mechanistic

basis of biosilicification processes.

1.2 diatom biosilica

1.2.1 Biosilicification in nature

During evolution many organisms (e. g., diatoms, sponges, radiolaria) have acquired

the ability to build specifically structured, silica-based exo- or endoskeletons using

silicon (Si), the second most abundant element in the Earth’s crust [27, 28]. These intri-

cately shaped biomineral structures are produced via biosilicification, which refers to the

process by which inorganic silicon is incorporated into living organisms as biosilica (i. e.,

‘biogenic silica’). Interestingly, diatoms can produce these structures from silicon under

benign ambient and physiological conditions (from 4 to 40 C, atmospheric pressure),

while silica formation in diatoms is around 106 times faster than the corresponding

abiotic process [29]. In contrast to biosilicification, industrial syntheses of silica in vitro

are typically accomplished under extreme temperature, pressure and pH. Amorphous

silica is a widespread biologically produced inorganic material, and thus, owing to its

abundance and physical properties, is also widely used as the basic raw material in

semiconductors, glass, plastics, ceramics, optical fibers, insulators, detergents, cosmet-

ics, and chromatographic materials such as resins. It is not surprising that the exquisite

features of diatom biosilica has been regarded as a paradigm for future silica nanotech-

1.2 diatom biosilica 5

nology [30–35], mainly due to unique structural features of biosilica cell wall, which

will be discussed further in the next section.

1.2.2 Diatom biosilica structure and cell cycle

The diatom silica-based cell wall, or frustule, ranges from 2 to 2000 µm and shows three-

dimensional morphologies on the micro- and nano-scale that are precisely reproduced

over generations. These hierarchical porous structures are characterized by levels of

symmetry and complexity far beyond the capabilities of best technologies available

to date. Frustules display an incredible variety of shapes and forms across different

diatoms species [36], which have attracted scientists by their inordinate beauty ever

since the earliest microscopical observations [37].

Based on the shape and symmetry of their frustules, diatoms are traditionally di-

vided into two main groups: the centrics and the pennates (see Fig. 1.1, [1]). Centric

diatoms could be classified into two subgroups based on different types of symmetry:

radial centrics have a circular center of symmetry in the middle of the valve, while polar

centrics have bi- or multipolar valves with an elongated or distorted center of symme-

try. In contrast to centric diatoms, pennate species are bilaterally symmetrical and their

shells are typically elongated parallel to the longitudinal axis of symmetry.

Typically, the diatom frustule consists of two almost identically structured overlap-

ping halves (theca), hence the taxon name1. The slightly larger top half (epitheca) over-

laps the bottom one (hypotheca), allowing them to fit each other much like a Petri dish

and its lid (see Fig. 1.2a). Each theca consists of a valve and several girdle bands that

span the boundary of a diatom cell. Terminal girdle bands in the overlapping region

of both thecas are termed pleural bands. Valves usually display lace-like patterns of

nanometre-scaled pores, while girdle bands exhibit far less decoration diversity.

Diatoms primarily reproduce asexually through binary fission, where each new

daughter cell receives either the epi- or hypotheca from the parent cell (refer to Fig. 1.2b).

This form of division results in a size reduction of the daughter cell that receives the

smaller frustule from the parent and therefore the average cell size of a diatom popu-

lation decreases over time. To avoid the significant size reduction, diatoms are capable

1 the word ‘diatom’ originates from Greek diá-tom-os (= dichó-tom-os) meaning ‘cut in half’

6 introduction

(7)

(8) (1)

(2)

(3)

(4)(5)

(6)

new

girdle bands

new hypovalve

new hypovalve

valve

SDV

girdle

band SDV

epitheca

hypotheca

protoplast

pleural band

girdle bands

valve

plasma membrane

(a) Diatom cell wall structure

(b) Diatom cell cycle

Figure 1.2 Structure of the diatom frustule and cell cycle. (a) Diatom cell wall structure. Thecell wall is made up of two half shells, named the epitheca and hypotheca, which together fullyenclose the protoplast. Each theca consists of a valve and one or more girdle bands that runlaterally along the outline of the cell. The terminal girdle bands of each theca constitute theoverlap region of the cell wall in which the slightly larger epitheca overlaps the hypotheca. (b)diatom cell cycle: (1) cytokinesis and formation of a valve silica deposition vesicle (SDV) ineach daughter protoplast; (2) and (3) expansion of the SDV and formation of a new hypovalvewithin each SDV; (4) exocytosis of SDV contents; (5) separation of daughter cells; (6) formationof the first girdle band SDV; (7) consecutive formation and secretion of girdle bands; (8) DNAreduplication. Figure adapted from Kröger and Poulsen [33].

1.2 diatom biosilica 7

of sexual reproduction, where meiotic cell divisions and gamete fusion results in the

formation of an auxospore with the augmented cell volume [38].

1.2.3 The cell biology of biosilica morphogenesis

Silicon is an essential nutrient for biosilica formation, while its limitation in diatom cul-

tures induces a cell cycle arrest. The formation of diatom biosilica takes place in special-

ized intracellular compartments termed SDVs. The SDV is considered to be a cellular

‘reaction vessel’ in which all the chemical steps of silica formation and patterning take

place. Although the immediate monomeric precursor for silica polycondensation in-

side the SDV is unknown, orthosilicic acid (Si(OH)4), which occurs in natural habitats

in concentrations between 3 and 70 µm [39], represents the original source for cell wall

biogenesis [33]. Si(OH)4 is transported into the diatom cell by specific Na+-dependent

transporter proteins, termed silicic acid transporter proteins (SITs) [40–44].

As indicated in Fig. 1.2b, valves are formed only during cell division, while girdle

bands are produced over interphase. Therefore, biogenesis of the diatom cell wall

requires two different types of SDVs that are present at different stages of the cell cycle.

During cell division each sibling cell produces a valve SDV, which gradually grows as

more and more silica becomes deposited. When valve formation is complete, the SDV

fuses with the cell membrane depositing the newly formed biosilica structure on the

cell surface. Immediately after cell cleavage, each daughter cell initiates the synthesis

of a new hypovalve. As the cell volume increases during interphase, the stepwise

formation of girdle bands takes place. When the cell volume has reached the required

size a new round of cell-division cycle begins (different stages of the diatom cell cycle

are shown in Fig. 1.2b). Studies on other silicifying organisms have demonstrated that

SDVs are not a speciality of diatoms but rather appear to be the general organelles for

silica biogenesis [45].

However, despite being very common in nature, biosilicification remains a poorly

understood phenomenon. Species-specific diatom biosilica structures are precisely re-

produced over generations, presumably indicating that biosilica morphogenesis takes

place under precise morphogenetic control, which, in turn, implies the existence of spe-

cialized proteins that guide integration of silica precursors into the protein-based or-

8 introduction

ganic templates. It is currently hypothesized that SDVs contain protein-based organic

matrices that control silica formation, resulting in species specifically nanopatterned

biosilica, an organic-inorganic composite material. Therefore, better understanding

of the molecular mechanisms of biosilica morphogenesis should be achieved through

the identification and characterization of proteins, that are intimately associated with

the cell wall and are directly involved in silica polycondensation. In recent decades,

significant insight into the molecular mechanism of silica biomineralization has been

obtained by structural and functional analysis of biomolecules that are involved in the

process of diatom biomineralization [46–49]. Furthermore, the genome sequencing pro-

vided an important resource for investigating the biosilica forming machinery [12–16].

However, the search for biomolecules that determine biosilica patterns turned out to

be extremely challenging.

1.3 the role of polyamine post-translational mod-

ifications in diatom biosilicification

The diatom frustule represents an inorganic-organic hybrid material that is mainly

composed of nanopatterned inorganic silica as well as various specific organic macro-

molecules (proteins and/or peptides, polysaccharides, long-chain polyamines, for re-

view refer to [33, 46, 49, 50]). It is assumed that protein-based organic templates are di-

rectly involved into biosilicification process, and, therefore, these organic components

are embedded within biosilica structures [48–50]. Given that biosilica is usually very

robust and resistant to most chemical and physical treatments, it is very challenging to

extract molecules (and especially proteins) from these composites without degrading

or chemically modifying them. In pioneering works by Nakajima and Volcani in

the early 1970’s, the uncommon amino acids 3,4-dihydroxyproline [51] and ε-N,N,N-

trimethyl-δ-hydroxylysine [52] were isolated from acidic hydrolysates of purified cell

walls. Since this first biochemical evidence for the presence of post-translationally mod-

ified proteins in diatom biosilica, more components have been discovered, as extraction

methods have become more exhaustive and also less chemically aggressive.

1.3 the role of polyamine ptms in diatom biosilicification 9

1.3.1 Identifying biomolecules associated with diatom biosilica

The general biochemical approach to identify biosilica constituents is to separate the

intracellular organic material from biosilica, and then extract the biosilica embedded

components from purified cell walls. This approach led to the identification of a num-

ber of protein families (i. e., frustulins2 [53, 54], pleuralins3 [55]) that are tighly associated

with the biosilica but were later demonstrated to be incorporated after SDV exocytosis

and therefore none of these proteins are actively involved in silica biogenesis [56].

Long-chain polyamines LCPAs, another class of biosilica-associated organic molecules,

were discovered upon complete dissolution of the diatom biosilica in liquid HF [57], a

treatment known also to cleave O-glycosidic and phosphate ester bonds, whereas pep-

tide bonds remain intact [58, 59]. After subjecting LCPAs to strong acidic hydrolysis,

their molecular masses remained unaffected, excluding the presence of peptide bonds

in their structures [57]. The ESI-MS study further indicated that long-chain polyamines

represent linear chains up to 20 repeated propyleneimine units [57], the longest poly-

amine chains found in nature. It was shown later, that each diatom species display a

wide variety of LCPA structures, including the overall chain length, the degree of N-

methylation, and, unexpectedly, site-specific incorporation of a quaternary amines [31,

60, 61]. It was hypothesized later that various biosilica patterns can be generated by

polyamines of different chain lengths and structures (for review see [62]). Mass- and

NMR-spectroscopic analysis revealed the presence of LCPAs in other silicifying organ-

isms like sponges [6], further corroborating their involvement into biosilicification. All

LCPAs identified to date have either a propylenediamine, putrescine, or spermidine

basis molecule to which linear oligo-propyleneimine chains are attached. Furthermore,

in vitro experiments have shown, that polyamines of different chain lengths induce

rapid silica precipitation from a silicic acid solution [63], which is enhanced or made

species-specific by a synergistic action with highly-specialized peptides and proteins,

as discussed below.

Upon full dissolution of the diatom silica with anhydrous HF novel peptides from

C. fusiformis diatom termed silaffins4 were extracted and characterized [64]. The first

2 from ‘frustule’, a diatom silicified cell wall3 from ‘pleural band’, the overlap region of hypotheca and epitheca4 from ‘silica affinity’

10 introduction

discovered peptides silaffin-1A and silaffin-1B from C. fusiformis were thoroughly char-

acterized in a follow-up study [65], displaying that these peptides are highly post-

translationally modified. To avoid the harsh treatment with anhydrous hydrogen fluo-

ride, the diatom biosilica was dissolved by an acidified ammonium fluoride (NH4F/HCl

pH ~5.0) solution [66]. This method allows the extraction of silaffins in their na-

tive state, keeping O-linked modifications intact. Furthermore, Poulsen and Kröger

employed the same approach to characterize silica-associated organic material from

T. pseudonana and identified three bands by SDS-PAGE corresponding to higher molec-

ular weight silaffin polypeptides, and a single band corresponding to LCPAs [67].

The first identified silaffins were subjected to N-terminal Edman sequencing, and the

database sequence searches allowed the identification of silaffin-encoding genes [64,

67]. Altogether, one silaffin from C. fusiformis and four from T. pseudonana were de-

scribed and characterized [64–69]. Analysis of protein sequences revealed the same

gene organization, namely the presence of a 22-amino acid signal peptide for co-

translational import to endoplasmic reticulum (ER) [70], which is flanked by N-terminal

RXL-spacer (sequences and UniProt entries are listed in Fig. A.4a–A.4e). This similarity

suggests the operation of analogous processing pathways for this silica-associated pro-

tein family. However, it was also found, that silaffins do not share significant sequence

similarity, thus preventing the use of homology-based tools for the identification of

related proteins in diatom genome databases.

The lack of sequence conservation prompted a genome-based bioinformatics min-

ing of other putative biosilica-associated proteins. Scheffel et al. developed an amino

acid composition-based bioinformatics approach, which enabled the identification of

86 silaffin-like proteins (SFLPs) in the genome of the diatom T. pseudonana [71]. A

group of six W or Y-rich proteins (listed in Fig. A.4i–A.4h), that exhibited highly

repetitive sequence structures with silaffin-like motifs (KXXK), were demonstrated by

GFP-tagging to be directly associated with the girdle band region of biosilica. These

proteins, hence called cingulins5, could not be purified from T. pseudonana cell walls

using established biosilica extraction approaches (see Section 1.3.1). Each cingulin

contains one RXL-containing domain, which starts (Fig. A.4i–A.4k) or ends (Fig. A.4f–

A.4h) with the tripeptide sequence RXL. This motif is also present in the precursors of

biosilica-associated diatom proteins, where they play a role of the recognition site for

5 from ‘cingulum’, the girdle band region of a frustule

1.3 the role of polyamine ptms in diatom biosilicification 11

proteolytic cleavage at the C-terminus of the leucine residue. Nevertheless, no other

biosilica-associated proteins were identified in these functional genomic studies.

Another protein component that is tightly associated with diatom biosilica, polypep-

tides called silacidins6, were co-purified with silaffins after mild dissolution of T. pseudonana

biosilica [72]. These polypeptides were enriched with phosphorylated serine and acidic

amino acids (hence the name), and it was hypothesized that these highly acidic low-

molecular weight peptides assist silaffins and LCPAs in silica precipitation [73]. Later,

several homologues of the gene encoding the silacidin protein in T. pseudonana were

found in different centric diatoms species [74], which may suggest their involvement

in biosilicification process. However, sequence conservation for silacidins appears to be

rather an exception across diatom biosilica-embedded proteins. Despite the presence of

multiple repetitive motifs (highlighted in Fig. A.4), silaffins appear to completely lack

α-helices and β-sheets and have largely a random coil structure, similar to natively-

unfolded proteins [75].

The past decades of diatom research provided significant insight into molecular com-

ponents of the biosilica-forming machinery, particularly proteins and peptides, that

may act both as structural templates and mechanistic catalysts for the silica polycon-

densation reaction (for review refer to [46]). In order to explain the mechanism of silica

morphogenesis by both silaffins and LCPAs several models have been proposed [76].

It was argued that only the polyamine moieties, but not the phosphate groups, are

directly involved in catalysis of silicic acid polycondensation [34]. The ammonium and

amino groups of the oligo-propylamine chains of silaffins and LCPAs are believed to act

as acid-based catalysts for the condensation of silicic acid [77]. The mechanisms of sil-

ica formation in the silaffin-1A from C. fusiformis (natSil1A) and the LCPA/phosphate

systems appear to be very similar. In both cases, electrostatic interactions between

polyamine chains and phosphate groups lead to the formation of supramolecular ag-

gregates [66, 78]. These aggregates appear to be responsible for accelerating the con-

densation of oligo-silicic acid molecules [78]. Fig. 1.3 shows the proposed mechanism

for catalysis of silicic acid condensation by oligo-propyleneimine containing molecules.

This suggests, that conservation of post-translational modifications patterns rather than

conservation of the amino acid sequence can be essential for silaffin function.

6 from “silica” and “acidic” nature of these peptides

12 introduction

2 ×

- +

++

+

(1)

(2)

(3)

+ +

Figure 1.3 Hypothetical mechanism for silica poly-condensation reaction catalyzed bypolyamines present in silaffins (adapted from Kröger and Sandhage [34]). Two propylene-imine units (hereafter denoted as propylamines) within a polyamine chain contain the aminogroup and the ammonium group (R = H or CH3), which bind silicic acid molecules by ahydrogen-bonding interaction: (1) protonation of the amino group and deprotonation of ammo-nium group by silanol (–Si–OH) results in formation of a reactive silanolate ion (–Si–O –©) andoxonium ion (–Si–O +©H2); (2) the silanolate group reacts with the neighbouring silicon atomresulting in a siloxane bond (–Si–O–Si–) formation through the elimination of a water (H2O);(3) newly formed silica is replaced by two other silicic acid molecules, and the catalytic cyclecommences.

1.3.2 PTM complexity of biosilica-associated proteins

The set of proteins expressed in a diatom cell and embedded into the diatom biosilica,

here termed as the biosilicome, represents highly post-translationally processed proteins

and/or peptides. The post-translational modification (PTM) complexity is definitely

the most remarkable feature of biosilica-associated proteins. In the course of intra-

cellular maturation silaffin precursors undergo extensive post-translational processing,

including the proteolytic cleavage of the N-terminal signal peptide [70] and the cova-

lent attachment of a different chemical moieties at multiple amino acid residues. The

latter results in extremely complex protein structures bearing numerous PTMs. Silaffin

PTMs range from global modifications such as phosphorylation, which is found in all

eukaryotic species, to unique modifications such as polyamines attached to the ε-amino

groups of lysines. Additionally, complex glycosylation and sulfation were reported for

several silaffins from T. pseudonana and C. fusiformis. The identification and chemical

1.3 the role of polyamine ptms in diatom biosilicification 13

characterization of multiple PTMs on the same polypeptides remains challenging [79–

82]. Although protein sequences can be deduced from nucleotide sequences, post-

translational modifications, in general, cannot. As will be presented below, the current

knowledge of silaffin PTMs is limited to a few proteins, while many more remain

unknown.

KATK KSXK KSXKSP

Gly

cc

S

Gly

c

Gly

cS S

SS

X X

X

P

S

Glyc polyamine PTMs

sulfation

phosphorylation

unknown PTMsglycosylation

signal peptide K-(X/S)-X- K KXXK repeatSP

N+

N+

N N

N

KSXK

PP

P

KATA K

S

NN

H

H

Figure 1.4 PTM complexity of biosilica-associated proteins. Biosilica-associated proteinscould be modified by wide array of PTMs. Overview of PTMs identified from different di-atom species is provided in Table 1.1. Site specificity of the most PTMs remains unknown.

Despite the progress in research of PTMs of biosilica-associated proteins, to date only

three glycoproteins have been identified from the diatoms T. pseudonana and C. fusiformis

[67, 83]. It was shown that silaffin-2 from C. fusiformis (natSil2) represents a highly

glycosylated and sulfated protein [83]. After deglycosylation with trifluoromethane-

sulfonic acid [84, 85] both glycosylation and sulfation modifications are completely

removed; it is not clear, however, whether sulfation is directly linked to the polypep-

tide backbone of natSil2 or to protein-bound glycans. The carbohydrate composition of

protein-bound oligosaccharides appeared to be rather complex: galactose, rhamnose,

14 introduction

glucuronic acid, fucose, glucosamine, and a monomethylated deoxyhexose. Presum-

ably due to the abundance of glucuronic acid natSil2 is the only component of the

C. fusiformis ammonium fluoride extract that is stained by the polycationic carbocya-

nine dye ‘Stains all’, which indicates highly negative net charge [86, 87]. HF treatment

converts natSil2 into a strongly positively charged protein, indicating that the high neg-

ative charge density of natSil2 results solely from its HF-sensitive PTMs. As mentioned

previously, the sequence of natSil2 remains unknown.

The T. pseudonana silaffins are highly glycosylated and sulfated acidic proteins, thus

resembling natSil2 from C. fusiformis. Silaffins silaffin-1/2 from T. pseudonana (tpSil1/2),

which occur as high (tpSil1/2H) and low (tpSil1/2L) molecular weight isoforms, and

silaffin-3 from T. pseudonana (tpSil3) have rather different carbohydrate composition:

tpSil3 have a substantial amount of glucuronic acid, whereas both tpSil1/2 do not have

it at all. Additionally both tpSil1/2 and tpSil3 contain some unidentified monosac-

charides [67]. HF-treatment [58, 59] of tpSil3 resulted in a single band on SDS-PAGE

with an apparent molecular weight of 35 kDa, which is considerably higher than the

predicted molecular weight of the mature polypeptide 21.2 kDa due to presence HF-

insensitive modifications [67, 68]. Similarly, after treatment with HF both isoforms of

tpSil1/2 resulted in two bands on SDS-PAGE with the apparent molecular weights

much lower than in untreated samples, which again is due to the presence of PTMs re-

sistant to HF-treatment. Consequently, an exceptionally high negative charge imparted

by the carbohydrate and sulfate moieties to regulatory silaffins makes them incapable

to precipitate silica alone. However, it was also found that deglycosylated natSil2

possess an intrinsic silica silica precipitation activity in vitro [83]. This demonstrates

that glycosylation and sulfation may autoinhibit the silica formation by modulating

the silaffin function. Poulsen et al. speculated, that regulatory silaffins may be able

to influence silica morphogenesis by means of their interaction with silica-forming

molecules, although its mechanism remains unclear [67, 83].

All silaffins are presumed to be phosphorylated to a significant extent. The first

identified and characterized silaffin appeared to be extensively phosphorylated pro-

tein. These phosphate groups affect SDS-PAGE significantly, increasing the apparent

molecular weight from ∼3 to 6.5 kDa. The attachment sites of the phosphate groups

within natSil1A were analysed by 31P-NMR spectroscopy, because the confirmation

with tandem mass spectrometry (MS/MS) analysis was difficult. It was shown, that

1.3 the role of polyamine ptms in diatom biosilicification 15

phosphate groups linked to silaffin-1A from C. fusiformis , of which seven bind serines

and one binds a ε-N,N,N-trimethyl-δ-hydroxylysine [66]. The total phosphate analy-

sis of silaffins natSil2 in C. fusiformis, tpSil3 and tpSil1/2 in T. pseudonana demonstrates

that these proteins are also substantially phosphorylated. However, none of phosphate

groups were mapped directly to the polypeptide sequences of this silaffins [67, 83].

The phosphorylation of modified lysines will be discussed in Section 1.3.3.

Another phosphorylated peptide from T. pseudonana is silacidin, a highly acidic low-

molecular weight peptide which mainly consist of Ser more than 60 % of which are

also highly phosphorylated [72, 73]. Like in the case of natSil1A, these phosphates

were identified with 31P-NMR with no direct mapping by mass spectrometry. Never-

theless, it is clear that phosphorylation affects numerous serine residues and plays an

essential role in biosilica formation. The presence of phosphate groups makes natSil1A

able to precipitate silica in the absence of phosphate buffer, whereas dephosphory-

lated natSil1A completely lacks silica precipitation activity. If the phosphate group is

not present on the protein, it has to be supplied a in buffer and is used up stoichiomet-

rically in the process. These results strongly support the hypothesis that phosphates

in biosilica-associated proteins serve as polyanions required in vivo for silica formation

directed by LCPA and polyamine PTMs present in silaffins. The phosphate moieties on

biomineralization proteins play an important role in mineral formation, yet the kinases

catalyzing the phosphorylation of these proteins are poorly characterized. Recently,

a membrane-associated serine/threonine kinase has been identified in T. pseudonana

based on its similar expression pattern as tpSil3 [88]. However, it only phosphory-

lates a fraction of all silaffins and accounts for only ~25 % of all silaffin kinase activity,

indicating that many other kinases are active.

1.3.3 Lysine ε-polyamine modifications in biosilica-associated proteins

Lysine PTMs in silaffins exhibit more complex and elaborate modification patterns

than O-linked modifications. Silaffins contain multiple lysine residues, which can be

modified by covalent attachment of polyamine chains. These polyamines represent

multiple linearly linked propylenimine units, exhibiting variations in chain length and

degree of N-methylation. Thus, even within one type of PTM, multiple subtypes ex-

16 introduction

ist, thus greatly expanding the scope of silaffin lysine modification. Being positively

charged at physiological pH, lysine polyamine modifications, in fact, significantly in-

crease cationic net charge of silaffins, which is essential for these proteins to exert

their silica precipitation activity under the acidic pH of diatom silica deposition vesicle

lumen [64]. Such a modification allows for a combination of cationic and hydrogen-

bonding interactions to bind tightly to the surfaces of silica particles. Although any

amine possess inherent silica precipitation activity, diatoms may employ complex pat-

terns of variable lysine PTMs for the fine regulation of silica precipitation process.

However, there is only scarce information available on the structure of ε-polyamine-

modified lysines in silaffins. All lysine PTMs known to date were reviewed by Kröger

and Poulsen in [33] and depicted in Fig. 1.5. Interestingly, beyond histone proteins,

the proteome-wide extent of lysine modifications remains largely uncovered by the

most recent reviews in the PTM research field [89], presumably due to the limited set

of currently available chemical structures.

(a) Poly-N-methylpropylamine attached to ε-amine of lysine

+

(b) ε-N,N,N-trimethyl-δ-hydroxylysine

Figure 1.5 Chemical structures of modified lysine residues with basic structural units of post-translational modifications present in silaffins (reviewed by Kröger and Poulsen [33]). Thebasis of each structure is the lysine moiety (highlighted in black). Phosphorylation of the hy-droxyl group of hydroxylysine (highlighted in red) was also described elsewhere [52, 66]. Poly-amine modifications of lysine residues represent an oligo-propyleneimine residues attachedto the ε-amino group of lysine (highlighted in blue) [64, 65]. Some of the propyleneimine-modifications are N-methylated (highlighted in green).

Indeed, besides modifications of natSil1A and tpSil3, very little information is avail-

able regarding PTMs in silaffins. The analysis of the first discovered silaffin natSil1A

from C. fusiformis revealed the presence of polyamine modifications resistant to HCl

hydrolysis [57, 64]. This 3.5 kDa silaffin peptide contains three different lysine re-

sidues representing ε-N,N-dimethyllysine, ε-N,N,N-trimethyl-δ-hydroxylysine, and ε-

1.3 the role of polyamine ptms in diatom biosilicification 17

polyamine-modified lysines. The latter modification is composed of 4–9 linearly linked

propylenimine units, in which each N-atom except the first one is methylated (Fig. 1.5a).

Additionally, 31P-NMR analysis of natSil1A has shown that phosphorylation affects

side hydroxyl chain of ε-N,N,N-trimethyl-δ-hydroxylysine [66] (Fig. 1.5b). This modifi-

cation and its non-phosphorylated counterpart were first discovered in 1970 by Naka-

jima and Volcani in diatom cell walls of N. pelliculosa [52]. Later, similar modification

was also found in hydrolysates of T. pseudonana biosilica [90, and current work], demon-

strating that δ-hydroxylysine phosphorylation may be important for silaffin function.

Lysine ε-amino groups of another 24 kDa protein silaffin-3 from T. pseudonana (tpSil3),

where 30 of the 33 lysines are embedded in a KXXK motif, modified by ε-N,N-dimethy-

lation and polyamine chains [67, 68]. Based on the PTM mapping results in tpSil3,

Sumper et al. formulated empirical rules, referred to as the ‘polyamine code’ (as a lin-

guistic equivalent to the concept of ‘histone code’) [68]. According to one of these rules,

in each K(A/S/Q)XK motif the N-terminal lysine has two aminopropyl units, while the

C-terminal lysine becomes ε-N,N-dimethylated. The existence of such regularity indi-

cates the presence of a sophisticated multi-step enzymatic machinery for silaffin post-

translational modification. However, it would be premature to stipulate the presence of

rules for enzymatic modification based on a single mapped protein. Given the lack of

sequence conservation among known silaffins, the presumption that silaffin function is

not dependent on a specific polypeptide fold, but rather requires a particular arrange-

ment of conserved post-translational modifications leads to the logical and testable hy-

pothesis of structure-function relationship. The lack of sequence conservation between

silaffins may also reflect the large phylogenetic distance between diatom species, from

which they originate (T. pseudonana and C. fusiformis). However at the moment it is not

possible to draw any specific conclusion on silaffin similarity due to the limited set of

silaffin sequences available. Investigating this intriguing question would require larger

set of biosilica-associated proteins with mapped lysine PTMs from different diatom

species.

To the best of our knowledge, polyamine-modified proteins are somewhat unique

for biomineralizing organisms7, and the pathway for their modification remains enig-

7 Protein polyamine modifications occur also in sponges and silicifying haptophytes [44, 91, 92]. Addi-tionally, an unusual amino acid hypusine (the molecule comprised of hydroxyputrescine and lysine) wasfound in all eukaryotes and in some archaea [93, 94]. The only known protein containing hypusine iseukaryotic translation initiation factor 5A (eIF5A) and a similar protein found in archaebacteria [95, 96].

18 introduction

matic. Particularly, it is unclear which enzymes responsible for catalyzing the in-

dividual steps in silaffin processing: sequential transfer of propylenimine units to

ε-amino groups of lysine, and methylation of primary and secondary amines [97,

98]. T. pseudonana polyamines exist both in a lysine-bound form, and also as free

long-chain polyamines [57], much like the silaffins intimately associated with biosil-

ica (as discussed above in Section 1.3.1). Each LCPA typically represents a ∼0.6 to

1.5 kDa molecule, based on putrescine or spermidine and comprised of several propy-

limine units, which are usually N-methylated. Therefore, the chemical structures of

polyamine-modified lysine residues in silaffins are very similar to the oligo-N-methyl-

propylamines units of LCPAs, thus implying a commonality in LCPA biosynthesis and

post-translational modification of silaffins. The analysis of T. pseudonana genome re-

vealed the presence of a group of N-aminopropyltransferases [99], sometimes fused to

a eukaryotic Tudor domains, that bind histones on N-methylated lysines [100]. Thus,

these putative multi-domain enzymes may be involved in post-translational modifi-

cation of silaffin proteins in a targeted and site-specific way, but the details are only

beginning to be elucidated.

All silaffin PTMs known to date are summarized in the Table 1.1. Apparently, lysine

polyamine modifications, unlike other PTMs, are present in all silaffins. Consequently,

specific ε-polyamine modifications of lysines may be a characteristic feature of proteins

involved in silicon biomineralization. Consequently, a molecular definition of silaffins

as a protein class in the absence of sequence conservation may be based on the pres-

ence of lysine polyamine chains with varying length and methylation degree. The

site-specific location and spacing of positively charged lysine modifications in silaffins

may be crucial for its silica-binding function. At the same time, the potential variety

of lysine PTMs may be associated with the phylogenetic relationships among diatom

species and their incredible morphological diversity. Hence, elucidation of the chem-

ical modifications of silicifying proteins should provide a more complete mechanistic

understanding of biomineralization processes in diatoms.

Over the last decades, the chemical understanding of lysine modifications in silaffins

has substantially advanced, however, there is still a significant gap between currently

known modifications and the full complexity of endogenous silaffin modifications

PTMs. To put this work in the context of previous studies, we point out methodological

limitations of most proteomic analyses of silaffin proteins. Until present time the dis-

1.3 the role of polyamine ptms in diatom biosilicification 19

Table 1.1 Overview of silaffin PTMs identified from different diatom species. See [33, 46, 101]for review.

Diatom Silaffin

Post-translational modifications

Referencesat lysineat proline at hydroxyl

amino acidsat ε-amino group at δ-position

C. fusiformisnatSil1AnatSil1B

Methylation,ε-polyamine chains

Hydroxylation,phosphorylation Not present Phosphorylation [57, 64, 66]

natSil2 Methylation,ε-polyamine chains Unknown Hydroxyproline

Phosphorylation,glycosylation,

sulfation

[83]

T. pseudonana

tpSil1/2Methylation,

ε-polyamine chains Unknown Hydroxyproline,dihydroxyproline [67]

tpSil3 Methylation,ε-polyamine chains Hydroxylation Not present [67, 68]

tpSil4 Unknown Unknown Unknown Unknown [69]

N. pelliculosa Unknown Methylation Hydroxylation,phosphorylation

Hydroxyproline,dihydroxyproline Unknown [51, 52]

E. zodiacus Unknown Methylation,ε-polyamine chains Unknown Unknown Unknown [102]

covery of biosilica-embedded proteins either relied on laborious biochemical analyses

of purified silaffins [65–68], or indirect methods such as whole genome expression pro-

filing under different stress conditions [49, 103–106]. Although these studies identified

many protein candidates potentially involved in silica formation, all the utilized ap-

proaches are not suitable for silaffin PTM characterization. Despite great biological in-

terest in lysine PTMs, our knowledge of modification sites is limited to a few proteins in

a couple of evolutionary distant species (including T. pseudonana and C. fusiformis). This

substantially hampers or even precludes the comparison of polyamine protein-bound

structures in the context of evolutionary and functional conservation. Addressing these

questions is challenging at both the biological and methodological level, and prompts

the development of new analytical strategies and chemical methods for PTM charac-

terization. High-throughput approaches for the identification of PTMs are now being

developed. Recent advances in MS instrumentation coupled to the development of an-

alytical methods over the past several years now allow us to investigate the biosilicome

on a global scale.

20 introduction

1.4 mass spectrometry in ptm discovery

The characterization of protein post-translational modifications (PTMs) remains one of

the major challenges of MS-based proteomics. Historically, one of the first applications

of mass spectrometry in protein research was mapping of a PTMs on a single pro-

tein [107]. Although until recently mass spectrometers have substantially evolved, the

basic operating principles of these instruments remain conceptually the same. In the

current section, recent advances in development of analytical approaches, instrumen-

tation, and bioinformatics analyses, as well as their implications for characterization of

silaffin PTMs will be discussed.

1.4.1 Modification-specific proteomics

In general, mass spectrometric detection of PTMs can be achieved via three strate-

gies: top-down, middle-down, bottom-up approaches (Fig. 1.6a–1.6c) [108]. The bottom-up

proteomics approach Fig. 1.6c represents by far the most commonly applied strategy

for the chemical characterization of protein modifications [109]. This method refers

to the analysis of modified peptides released from the protein by enzymatic cleav-

age. In this approach, peptides are usually obtained via digestion of the protein with

a site-specific proteolytic enzyme(s), typically trypsin. Proteins can be digested in-

solution, or pre-fractionated by sodium dodecyl sulfate polyacrylamide gel electropho-

resis (SDS-PAGE) followed by in-gel digestion [110, 111]. The latter method allows

removal of low-molecular-weight contaminants already at electrophoresis step and in-

creases resolution in analytical separations. The resulting protein digest is separated

by reversed-phase liquid chromatography (RPLC), which is followed by MS/MS frag-

mentation. In most cases the observed mass shift in a peptide mass spectrum indicates

a certain PTM type. By searching for the corresponding mass shift, modified peptides

can be identified and the PTM sites mapped back to the protein sequence.

In contrast to the bottom-up approach, a ‘top-down’ analysis Fig. 1.6a can provide a

global view on PTMs present in intact proteins. PTM characterization by a top-down

approach may be achieved with nonergodic fragmentation techniques such as ETD and

ECD. However, top-down approach is less sensitive than bottom-up, and data interpre-

1.4 mass spectrometry in ptm discovery 21

Separaon

of proteins

MS-analysis

of intact proteins

(≤ 50 kDa)

Inte

nsit

y

Inte

nsit

y

MS/MS

(protein sequences)

LC-MS

(intact protein masses)

Protein mixture

(a) Top-down proteomics

Digeson

(Asp-N, Glu-C, etc.)

Separaon

of pepdes

MS-analysis

of pepdes

(~2-20 kDa)

Inte

nsit

y

Inte

nsit

y

MS/MS

(pepde sequences)

LC-MS

(intact pepde masses)

Protein mixture

(b) Middle-down proteomics

Digeson

(Trypsin)

Separaon

of pepdes

MS-analysis

of pepdes

(~0.5-3 kDa)

Inte

nsit

y

Inte

nsit

y

MS/MS

(pepde sequences)

LC-MS

(intact pepde masses)

Protein mixture

(c) Bottom-up proteomics

Figure 1.6 Proteomics approaches

tation may be non-trivial due to the higher complexity of both MS1 and MS2 spectra

from multiply charged precursor ions [112]. Here, middle-down strategy Fig. 1.6b, in

which proteins are digested into peptides commonly in the 3 to 9 kDa range, might rep-

resent an appropriate compromise, combining both the sensitivity and global overview

of silaffin PTM complexity. However, similarly to a top-down approach, longer pep-

tides (>3 kDa) generated in a middle-down fashion have much wider charge-state dis-

tributions as compared to bottom-up peptides, thus reducing the overall signal sensi-

tivity. Therefore, the bottom-up approach, which usually involves high-performance

liquid chromatography (HPLC) separation of in-gel-digested proteins, clearly demon-

strates an optimal sensitivity for mapping of silaffin PTMs. However, the use of con-

ventional trypsin-based bottom-up approach for lysine PTM mapping appears to be

premature, because it implies that both the ‘intact’ protein sequence and the PTMs are

exactly known.

We therefore shifted a ‘classical’ bottom-up paradigm towards a prior analysis of

modified lysines. Biosilica-embedded proteins can be broken down to amino acids by

22 introduction

hydrolysis, while methylated and polyamine-modified lysines are stable to both acid

and alkali treatment. Cleavage with hydrochloric acid (HCl) is the most common hy-

drolysis method [113], which was first applied in diatom research by Nakajima and

Volcani [51, 52] and further utilized by the groups of Sumper and Kröger to identify

polyamine-modified lysines in isolated silaffins [64–68, 83]. Surprisingly, the full quali-

tative and quantitative lysine PTM profiling has never been done so far in total diatom

biosilica extracts, while it represents a prerequisite for any further PTM mapping stud-

ies. The important advantage of this straightforward approach lies in its simplicity and

ability to characterize the lysine modifications in total biosilica hydrolysates without

any additional treatment. However, analysis of ε-polyamine-modified lysines displays

multiple analytical challenges, as discussed further.

1.4.2 Analysis of polyamine-modified lysines by mass spectrometry

Identification of amino acids in acidic hydrolysates is a classical protein analysis method,

which has been referred to as amino acid analysis (AAA) [114]. Therefore, identifica-

tion of lysines along with their PTMs represents a variation of AAA, though it is

specifically focused on profiling of post-translationally modified lysines. Hydrolysis

using hydrochloric acid (HCl) is currently universally applied to AAA, because HCl

can cleave peptide bonds completely independent of the amino acid sequence and

PTMs [113]. Hence, AAA is applicable to highly-modified proteins such as silaffins,

that are not easy to be analyzed by enzymatic proteolysis. Most amino acids are ob-

tained quantitatively from protein hydrolysates by HCl, which can easily be removed

by evaporation afterwards. Therefore, HCl hydrolysis provides a generic and straight-

forward method, which combines simplicity, accuracy, and wide applicability.

After HCl cleavage of biosilica-associated proteins, ε-polyamine-modified lysines

have to be analyzed in total biosilica hydrolysates. To address this issue, the RPLC sepa-

ration methods were traditionally used to separate, identify and quantify components

in complex mixtures. However, the conventional RPLC separation of underivatized

polyamines, or (in our case) ε-polyamine-modified and methylated lysines is challeng-

ing due to the low retention of hydrophilic and highly charged polyamines, and their

susceptibility to undergo severe tailing [115]. Hence, polyamine liquid chromatogra-

1.4 mass spectrometry in ptm discovery 23

phy (LC) either requires the utilization of ion-pairing techniques, which are generally

poorly compatible with MS, or hydrophilic interaction chromatography (HILIC) alter-

natives, which suffer of lower separation efficiency compared to the RPLC.

Due to the reasons above polyamine molecules are commonly analyzed by making

them react with different derivatizing agents [116] with attachment of bulky hydropho-

bic groups and thus enhancing the hydrophobicity of the derivatized compounds,

which could be separated on a reversed-phase column, resulting in higher sensitiv-

ity in comparison to underivatized molecules. Most of these derivatization reagents,

however, exhibit certain disadvantages and limitations, including derivative instability,

inconsistent production of derivatives, inability to derivatize secondary amino groups,

necessity of removal of excess reagent to avoid rapid RPLC column deterioration and

poor compatibility with ESI-MS [117–124].

+

AQC QAC-derivative NHS1° or 2° amine

+pH~9

Figure 1.7 AQC derivatization chemistry. Primary and secondary amines react with 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC) yielding 6-quinolinylaminocarbonyl-amines (QAC-derivatives) and N-hydroxysuccinimide (NHS)

In contrast to other reagents, derivatization with 6-aminoquinolyl-N-hydroxysuccin-

imidyl carbamate (AQC)8 was demonstrated to be a simple, fast and reproducible

pre-column derivatization [126–134]. AQC increases chromatographic retention and

improves electrospray ionization (ESI) of hydrophilic molecules, thus making them

directly amenable for RPLC. The AQC reagent uses the N-hydroxysuccinimide (NHS)-

activated heterocyclic carbamate to quantitatively derivatize primary and secondary

amino groups, converting them to stable hydrophobic 6-quinolinylaminocarbonyl (QAC)-

derivatives (the reaction chemistry is depicted in Fig. 1.7). The excess of AQC reagent

reacts with water to form 6-aminoquinoline (AMQ), which can be easily separated

from the QAC-derivatives (see Fig. A.1 for potential degradation ways). Therefore, this

8 According to the recommendations of the IUPAC for nomenclature of organic compounds [125]it was also called N-hydroxysuccinimidyl 6-quinolinyl carbamate (HSQC), CAS registry number148757-94-2 [126]. For simplicity reasons and regarding to the common use in the literature, withinthis thesis it was named AQC.

24 introduction

fluorescent derivatizing reagent originally designed for UV-absorbance detection [127],

can be applied for relative quantification with ESI-MS with higher sensitivity and

specificity. Consequently, the ease of AQC derivatization and the stability of QAC-

derivatives makes this approach ideally suitable for accurate profiling of lysine polyamines

in crude biosilica extracts.

After all the QAC-derivatives of polyamine-modified lysines are catalogued and

quantified in total biosilica hydrolysates by LC-MS/MS, lysine PTMs can be mapped to

specific sites in biosilica-associated proteins by the bottom-up proteomic strategy. The

latter usually implies pre-fractionation of intact proteins and enzymatically cleaved

peptides before mass spectrometric analysis [109], which will be covered in the follow-

ing section.

1.4.3 Fractionation of proteins and peptides prior to mass spectrometry analysis

Pre-fractionation of proteins by sodium dodecyl sulfate polyacrylamide gel electro-

phoresis (SDS-PAGE) followed by in-gel digestion is the most commonly used sample

preparation technique in proteomics to date [110, 111]. This gel-based approach allows

reliable elimination of common contaminants (such as detergent and salts), meaning

that essentially any chemicals can be used for sample preparation prior to the gel.

Following in-gel digestion or acidic hydrolysis of silaffin proteins, the resulting com-

plex peptide or amino acid mixture needs to be (at least partly) fractionated prior to

introduction into the mass spectrometer. For this, reversed-phase liquid chromatogra-

phy (RPLC) separates individual compounds mainly by their hydrophobicity, making

it by far the preferred method used for separation of peptides released by enzymatic

digestion. Typically, octadecyl carbon chain C18-bonded silica used for the column

packing (‘stationary phase’), whereas the acidified water and acetonitrile solvents (‘mo-

bile phase’) used for creation of concentration gradients, which refers to the increase of

the concentration of one solvent versus another during the MS acquisition. The ion-

ization of non-volatile peptides is usually achieved by soft ionization techniques such

as electrospray ionization (ESI) and nESI (nano-ESI), where a high voltage is used to

create an electrostatically charged spray that triggers the desolvation of peptides from

solvent droplets into the gas phase [135]. Due to direct compatibility, RPLC can be

1.4 mass spectrometry in ptm discovery 25

coupled online to ESI ion source of a mass spectrometer. RPLC separation efficiency

depends on many factors such as column parameters (dimensions, adsorbent surface

chemistry, material particle size and packing, etc.), separation conditions (column tem-

perature, solvent flow rate, etc.), eluent types and composition, and the chemical nature

of the components. Therefore, it is possible to achieve an optimal selectivity and sen-

sitivity via proper adjustment of these parameters for each individual compound of

interest. Next, extensively pre-fractionated protein digest is subjected to tandem mass

spectrometry (MS/MS) analysis.

1.4.4 Tandem mass spectrometry analysis in modification-specific proteomics

Mass spectrometry (MS) has been widely recognized as a superior analytical technique

in proteomics and has gained an important role in the analysis of PTMs. To date, a

wide variety of different mass spectrometry instrument configurations, that differ in

performance capabilities (i. e., ionization method, scan speed, resolution, sensitivity,

and mass range) have been developed for proteomics. A major breakthrough was the

introduction of Orbitrap™ (Thermo Fisher Scientific) mass spectrometers that are

recognized as a gold standard for mass spectrometry-based proteomics at the moment.

Most of the experiments reported in this thesis were carried out using a ‘hybrid’ tan-

dem mass spectrometer, Linear Trap Quadropole (LTQ) Orbitrap Velos combining a

linear ion trap with the Orbitrap analyzer [136]. This instrument can routinely achieve

high resolution and mass accuracy, providing confident identification of PTMs with

high sensitivity and throughput. The principal scheme of this instrument is shown in

Fig. 1.8.

Typically, the mass spectrometer measure the mass-to-charge ratio (m/z) of gas-phase

molecular ions within certain resolution and m/z range. Gas-phase ionized molecules

arrive from ESI source (a) through the ion optics (b) to linear IT, or Linear Trap

Quadropole (LTQ). Trapped ions of a defined m/z range can be isolated and analyzed

using tandem mass spectrometry (MS/MS) in high pressure IT (c), where fragmenta-

tion is achieved upon collisions with molecules of neutral gas, hence termed collision-

induced dissociation (CID). The resulting fragments are then detected in low pressure

IT (d) as they hit a mass detector. Alternatively, the precursor ions can be transported

26 introduction

(e)(a) (d) (f )(c)

(g)

to vacuum pumps

ion transfer

ion optics

(b)

Figure 1.8 Schematic view of the LTQ Orbitrap Velos mass spectrometer. The hybrid config-uration couples a linear ion trap (IT) to a high resolution mass analyzer (Orbitrap). The in-strument is equipped to perform both collision-induced dissociation (CID) and higher-energycollisional dissociation (HCD) mode of peptide fragmentation. (a) electrospray ion source; (b)stacked ring ion guide (S-lens); (c) high pressure IT (6.7× 10−3 mbar); (d) low pressure IT(4.7× 10−4 mbar) for CID; (e) the curved ion trap (C-trap); (f) HCD collision cell; (g) Orbitrap.Figure is adapted from Olsen et al. [136].

all the way from IT to the C-trap (e) and further into the external collision cell (f), where

higher-energy collisional dissociation (HCD) fragmentation takes place. The ions are

then returned to the C-trap, from where they are ejected into the Orbitrap mass ana-

lyzer (g) by the high-energy pulse. This energy force the ions to circulate around the

central rod electrode and oscillate with different axial frequencies that are proportional

to their m/z. However, the acquisition speed of Orbitrap-HCD is about half of what is

found for IT-CID fragmentation spectra. Despite this limitation, higher resolution and

mass accuracy of MS/MS spectra, provided by Orbitrap mass analyzer9, outperform

those of CID spectra acquired in IT. Moreover, IT-CID fragmentation does not allow for

trapping of fragment masses below ‘low mass cut-off’, or the so-called ‘1/3 rule’ (∼28% of

the precursor mass). In contrast to IT-based CID, HCD with Orbitrap detection is less

affected by low mass cut-off limitation and supports detection of lower m/z region10. It

is highly advantageous for fragmentation of modified peptides, where a covalent mod-

ification may give rise to a ‘characteristic’ low-molecular fragments, also denominated

as ‘marker’, ‘diagnostic’ or ‘reporter’ ions. These characteristic fragments are analytically

very useful, since their occurrence is a reliable indicator for the presence of the cor-

9 A m/z range from 100 to 2000 can be measured in 1.3 s at a targeted resolution 130 000 at m/z 400(Rm/z 400), whereas it declines proportionally as 1/√m/z

10 The C-trap scheme is better for transmission and detection of low-molecular weight fragments, com-pared to quadrupole IT [137]: it is only limited by rf-amplitude of the C-trap and cuts off m/z below~1/20 of the precursor m/z and therefore does not compromise the detection of low-molecular fragments.

1.4 mass spectrometry in ptm discovery 27

responding modified amino acid residue (for review refer to [138]). Alternatively, in

some cases, the peptide modification may give rise to a strong neutral loss, where all

charge is retained on one dominant fragment ion, suppressing the relative ion abun-

dance of other fragmentation events. Here HCD demonstrates another clear advantage

over CID, allowing multiple fragmentation events and resulting in richer MS/MS spec-

tra of modified peptides, where the neutral loss may be unproductive.

As seen from above, the hybrid mass spectrometer combines complementary frag-

mentation by IT-CID and Orbitrap-HCD, presenting the advantages of speed for the

former and accurate measurements within wide m/z range for the latter. A common op-

erational mode to control the MS/MS acquisition process is based on data-dependent

acquisition (DDA), where the most abundant precursor ions are selected for MS/MS

analysis. This process is depicted in Fig. 1.9b. In this strategy, the mass spectrometer is

programmed to select the ions with predefined features (e. g., charge state) in the full

scan (MS1) for fragmentation in a cyclic way, so that N of the most intense precursors

(TopN) are subsequently subjected either to CID or HCD MS/MS scan (MS2). The

m/z values of fragmented precursors are placed into the dynamic ‘exclusion’ list for a

certain period of time to allow the successive fragmentation round of next abundant

precursor. Typically, each MS1 is followed by several MS2 rounds, alternating back-to-

back CID and HCD fragmentation for same precursor ion, after which the new DDA

cycle commences. Therefore, DDA attempts to optimize productivity of MS/MS by

minimization of redundant peptide precursor selection and maximization the number

of peptide identifications.

Analysis of MS/MS spectra provides information on the molecular weight of the

fragment ions and enables extrapolating its sequence and position of PTM sites [141].

In LTQ Orbitrap Velos peptide ions are fragmented by either one of the collision-

induced techniques: collision-induced dissociation (CID) and higher-energy collisional

dissociation (HCD). Here the peptide backbone cleavage occurs through the minimum-

energy path via breakage of amide CO NH bonds (marked by in Fig. 1.9c). The

most common peptide fragments that produced by low-energy collisions are b- and

y-ions, highlighted in the Fig. 1.9c in red. The mass differences between y-ion series in-

dicate the amino-acid sequence, which could be read from C- to N-terminus. Typically,

HCD spectra (in contrast to CID) contain ions in the low m/z range including y1, y2

and immonium ions (IMs) of modified residues. In addition, a-ions can occur, which

28 introduction

RT: 0.00 - 154.98

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

51.74

60.0021.29

42.89

57.5542.64 51.3224.09

40.39

9.19 43.07 77.0774.1730.12

15.077.96 47.03 63.5033.87

94.3993.68 96.62134.52 136.02123.27118.257.18 107.68

Re

la

ve

ab

un

da

nce

, %

Retenon me, min

(a) total ion chromatogram (TIC)

400 450 500 550 600 650 700 750 800 850 900 950 1000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

558.3260z=2

660.3243z=2

776.3519z=2525.2667

z=2

620.3027

z=1

477.8230

z=1

Inte

nsi

ty

m/z

Inte

nsi

ty

m/z

Inte

nsi

ty

m/z

Next MS1

survey

MS/MS 1st most

intense precursor

MS/MS 2nd most

intense precursor

MS/MS 3rd most

intense precursor

Re

la

ve a

bu

nd

an

ce,

%

MS1 scan

(b) full scan (MS1) survey and data-dependent acquisition (DDA) of the Top3 most intense precursors

these precursors are excluded from fragmentation in the next cycle

Figure 1.9 Principles of data-dependent acquisition (DDA) and peptide fragmentation: (a)total ion chromatogram (TIC) of Asp-N digest of biosilica AFSM extract; (b) the DDA cycle ofthe Top3 most intense precursors; (c) fragment ion nomenclature.

1.4 mass spectrometry in ptm discovery 29

a2b2

c2

x5

y5

z5

y4

y3

y2

y1

y6

b1 b3 b4 b5 b6(c) Fragment ion nomenclature [139, 140]. This schematic diagram shows N-terminal a,

b and c ions and C-terminal x, y and z ions for a seven amino acid peptide

represent a CO loss from the b-ions11. If the peptide is modified, ion series containing

the modification will demonstrate the corresponding mass shift. Additionally, for cer-

tain modifications the characteristic fragments or neutral losses occur. E. g., H3PO4-loss

is typically observed for phosphorylation (−98 Da) and CH3SOH-loss for methionine

oxidation (−64 Da).

As shown in Fig. 1.9c, fragments in b- and y-ion series differ from each other by

mass of one amino acid residue. Consequently, it is possible to derive sequence in-

formation from peptide fragment spectra. However, the experimental MS/MS spectra

often demonstrate incomplete ion series, intervening peaks from co-isolated precursors,

etc. Assuming that MS-acquisition is performed in course of HPLC run, it generates

increasingly complex datasets containing a large number of MS/MS spectra. Conse-

quently, the subsequent analysis have to face a huge amount of generated data. The

presence of PTMs boosts the combinatorial explosion in the number of potential pro-

tein modification states, which excessively complicates the data analysis of proteomics

data and leads to high number of false-positive identifications [143, 144]. Therefore,

modification-specific analysis of silaffins requires a toolkit of dedicated computational

and statistical methods, which will be reviewed in Section 1.4.5.

11 This loss is usually observed for b2-ion and generates HCD-characteristic a2/b2-pair in the lower massrange [142] (see Fig. 1.9c)

30 introduction

1.4.5 Bioinformatics tools for modification-specific proteomics

Bioinformatics represents an essential aspect of MS-based proteomics, especially due

to the complex datasets produced by modification-specific proteomic analysis. PTM

annotation using MS and MS/MS data can be achieved using automated search tools.

Prior to the search against protein sequence database, the peptide-derived spectral data

is converted into a peak list containing the m/z value of the precursor and its fragments.

Several algorithms have been developed to carry out this analysis, where peptide can-

didates are assigned to an experimental spectrum within the shortest possible time

and ranked by an empirical or statistical peptide-spectrum match (PSM) score. Among

the most popular search engines are Mascot and SEQUEST, while many others exist

(for review refer to [145–147]). However, the identification rate for each type of soft-

ware is limited, irrespective of its sophistication and algorithmic basis, due to MS/MS

spectra from co-isolated precursors, unspecific or unexpected protease cleavage, incor-

rect monoisotopic peak assignments or charge state determinations. Hence, database

searching using tandem mass spectra is particularly challenging, especially due to the

complex nature of multiple coexisting lysine PTMs in silaffin proteins. This effect is

much more noticeable for the analysis of larger peptides as their long sequences can

be modified several times with different modifications that exceedingly complicates

database searches. Therefore, specialized computational approaches need to be imple-

mented to allow for a comprehensive assignment of silaffin PTMs.

Commonly, a modification-specific MS-data analysis strategy can be divided into

three major steps: pre-processing of the MS output data, PTM search against pro-

tein sequence database, and statistical validation of PSM and PTM assignment results.

The goal of the preprocessing step is to increase the quality of a subsequent database

search results, which may include cleaning, deisotoping and deconvolution of raw

peptide MS/MS spectra [148–151]. In a second step, before initiating the database

search, the user is asked to specify a list of PTMs that can be searched as ‘fixed’ or

‘variable’ [152]. Fixed modifications are applied universally, while variable modifica-

tions are those which may or may not be present. Finally, the database search result

typically represent a mix of correct and false PSMs. The confident way to filter for con-

fident PSMs hits is the use of a composite target-decoy database, which is created by

reversing or scrambling protein sequences from a target database and is then appended

1.5 rationale of the thesis 31

to it. Under the assumption that random decoy PSMs and target matches follow the

same distribution, a score cut-off corresponding to certain false discovery rate (FDR)

can be estimated [153]. In addition to FDR evaluation, another goal of the validation

step of the modification-specific analysis is the precise localization of PTMs within a

peptide [152]. To this end, the corresponding algorithms can assess the cumulative bi-

nomial probability of correct site localization using Mascot Delta Score (MD-score) [154]

or “localization probability score,” which is integrated into MaxQuant [155]. However, the

validation of site-specific PTM assignments very often relies on the visual inspection

of the search results and therefore has to be often verified manually.

1.5 rationale of the thesis

The characterization of silaffin PTMs is one of the major tasks that has to be accom-

plished in proteomics of diatom biosilica. As discussed in Section 1.3.2, silaffin pro-

teins undergo multiple post-translational processing events, which include proteolytic

cleavage and covalent addition of a modifying groups to various amino acid residues.

Lysine modifications represent the most remarkable class of silaffin PTMs, because they

are hypothesized to modulate chemical properties of biosilica-associated proteins and

regulate their silica-precipitation function. Our knowledge of lysine PTMs is limited to

a few proteins from phylogenetically distant species, which preclude any conclusions

according to their structural relationship and functional importance.

Since silaffin lysine PTMs are highly heterogeneous, it is necessary first to create a cat-

alogue of polyamine-modified lysines present in all studied diatom species. Although

the profiling of lysine modifications may be achieved by the developed technique in

any biosilica extract, the further localization of pre-profiled PTMs to weakly homolo-

gous protein sequences (refer to Section 1.3.1) ultimately requires the availability of a

complete genome or at least a substantial part of cDNA sequences. Luckily, the recent

advances in genomic sequencing have enabled a unique opportunity to localize silaffin

PTMs to specific sites in biosilica-associated proteins. Currently, six diatom genome

sequences have been published [13–16], providing resources for deeper research into

the proteome of these organisms. Therefore, the current study has been focused on

32 introduction

(a) Phylogenetic tree for three diatomspecies

1μm

(b) T. pseudonana (valveview)

1μm

(c) T. pseudonana (girdle view)

1μm

(d) C. cryptica (valve view)

1μm

(e) C. cryptica (girdle view)

1μm

(f ) T. oceanica (valve view)

1μm

(g) T. oceanica (girdle view)

Figure 1.9 Phylogenetic tree [156] and scanning electron microscope (SEM) images of the cellwalls of three centric diatom species. These diatom species have circular valves with varyingornamentation between central and rim region of the valve face (Fig. 1.9b, 1.9d and 1.9f), whiletheir girdle bands have less structural complexity and look similar to each other (Fig. 1.9c,1.9e and 1.9g). Biosilica architectures of these diatom species demonstrate clear differences:T. oceanica (Fig. 1.9f) has a much smoother valve surface throughout most of the valve areain contrast to T. pseudonana (Fig. 1.9b) and C. cryptica (Fig. 1.9d). At the same time, all threediatom species possess tube-like features at the valve rim; the T. oceanica shell, however, almostentirely lacks elevated mesh-like ridges on the valve surface, which are present only in the rimarea (Fig. 1.9f). SEM images are courtesy of D. Pawolski.

1.5 rationale of the thesis 33

three centric diatom species with sequenced genomes: Thalassiosira pseudonana [13],

T. oceanica [14] and Cyclotella cryptica [15]. Their biosilica cell wall structures, presented

in Fig. 1.9, display heterogeneous morphology that reflects their phylogenetic prox-

imity, where T. pseudonana is more closely related to C. cryptica than to T. oceanica [156,

157].

In this thesis, I would like to expand the number of profiled lysine modifications to

several phylogenetically-related species in order to compare their profiles in different

diatom species. Therefore, this work aimed at the development of the corresponding

analytical method for analysis of lysine polyamine modifications in biosilicifying pro-

teins. After the proof-of-concept, the method will be applied to biosilica extracts from

the three diatom species. Furthermore, lysine PTMs need to be localized at biosilica-

associated protein sequences. This effort would eventually result in determination of

consensus modification sites, which is a key requirement for mechanistic understand-

ing of post-translational modification machinery in diatom biosilica.

2 A I M O F T H E T H E S I S

The function of biosilica precipitating proteins is largely defined by the presence of

lysine post-translational modifications (PTMs), and exploring their diversity is critical

for a mechanistic understanding of the biomineralization process in diatoms.

The primary aim of this study is to define consensus motifs that comprise a ‘Rosetta

stone’ for the lysine modification code for biosilica-associated proteins.

Three goals were to be addressed during the research:

1. Establish an analytical method for lysine PTM profiling that shifts the conven-

tional bottom-up approach towards analysis of modified lysines in total biosilica

hydrolysates (Section 3.1).

2. Apply the method above for global profiling of lysine modifications in biosilica

extracts from three closely related diatom species: T. pseudonana, T. oceanica and

C. cryptica. To this end, we examine whether similarities at the molecular level

follow evolutionary proximity (Section 3.2).

3. Map of pre-profiled polyamine PTMs back to protein sequences in order to de-

termine consensus motifs for polyamine modifications in biosilica-associated pro-

teins (Section 3.3).

35

3 R E S U LT S A N D D I S C U S S I O N

Contents3.1 A method for analysis of ε-polyamine PTMs . . . . . . . . . . . . . . . . 38

3.1.1 Establishing a method to analyse ε-polyamines . . . . . . . . . . 38

3.1.2 Method applicability for lysine PTM profiling . . . . . . . . . . 40

3.1.3 Profiling of lysine PTMs in silaffin-3 . . . . . . . . . . . . . . . . 43

3.2 Profiling lysine PTMs in biosilica extracts . . . . . . . . . . . . . . . . . 46

3.2.1 Lysine PTM profile and characteristic fragments . . . . . . . . . 47

3.2.2 Elucidation of phosphopolyamine structures . . . . . . . . . . . 59

3.2.3 Lysine PTM profiles of AFSM extracts . . . . . . . . . . . . . . . 62

3.2.4 Comparison of AFIM and AFSM profiles in T. pseudonana . . . . 65

3.2.5 Phylogenetic relationship across three diatom species . . . . . . 67

3.3 PTM localization and discovery of consensus motifs . . . . . . . . . . . 72

3.3.1 Multiple protease strategy for mapping lysine PTMs . . . . . . 72

3.3.2 Selection of deprotection technique . . . . . . . . . . . . . . . . . 74

3.3.3 Mapping lysine PTMs on tpSil3 using iterative search strategy . 77

3.3.4 Deconvolution of raw MS/MS spectra . . . . . . . . . . . . . . . 80

3.3.5 PTM mapping by polyamine-specific fragments . . . . . . . . . 83

3.3.6 Identification of consensus motifs harboring lysine PTMs . . . . 85

37

38 results and discussion

3.1 a method for analysis of ε-polyamine ptms: a

proof-of-concept

3.1.1 Establishing a method to analyse ε-polyamine PTMs in biosilica hydrolysates

This study aimed to establish a selective and sensitive method for analysis of polyamine-

modified lysines. According to the previously identified compounds (refer to Fig. 1.5

in Section 1.3.3), lysine ε-polyamine modifications exhibit predictable structures con-

sisting of repeating propyleneimine (hereafter denoted as propylamine) units and N-

methyl groups. A general formula for lysine modifications is displayed in Fig. 3.1,

where structural units are color-coded. Each structure consists of several propylamines

that are attached linearly to ε-amine of lysine residue (PA0, PA1, PA2, PA3, ...), display-

ing different degree of N-methylation (Me1, Me2, Me3, ...). Additionally, the lysine

side-chain can be hydroxylated (Hydroxy), whereas this side hydroxyl can be phos-

phorylated (Phospho).

The list of m/z values calculated according to the generic formula can be used for tar-

geted MS/MS analysis of ε-polyamine-modified and methylated lysines (Table 3.1).

Previously, direct infusion ESI-MS/MS was applied in a number of studies allow-

ing characterization of modified lysine residues cleaved from biosilica-associated pro-

teins [6, 52, 57, 64, 67, 68, 83, 90]. However, being simple and rapid, the direct infusion

analysis would not optimally allow discrimination and quantification of multiple com-

pounds with isobaric molecular masses corresponding to the structural formula in

Fig. 3.1 (e. g., the molecules with different position of N-methyl groups). As of now,

this limitation precluded mainly the full qualitative and quantitative profiling of lysine

modifications present in total hydrolysates of the whole biosilica extracts. Therefore,

we set up a novel technique, aiming not to replace the existing one, but rather to

complement it with the structural profiling and quantitation of isomeric ε-polyamine

modifications.

We established a method, which includes acidic hydrolysis and derivatization by

AQC followed by LC-MS/MS analysis. The analyzed polyamine structures display

different degrees of N-methylation, allowing a maximum of 2n + 1 additional methyl

groups per molecule (where n represents a number of nitrogens in polyamine back-

3.1 a method for analysis of ε-polyamine ptms 39

+ + +PA1 PA2 PA3

+

Me1

...

Me3 Me7Me5

Me2 Me4...

Me6

Phospho

Hydroxy R1

R2

R3

R4

R5

R6

R7

Lysine (PA0).

Figure 3.1 Generic structure of the lysine post-translational modifications in biosilica-associated proteins. PA0, lysine residue; PA1, PA2, PA3, propylamine units; Me1–Me7, N-methylation positions (where R1–R7 = H or CH3); δ-hydroxylation of lysine (Hydroxy); phos-phorylation of side hydroxyl (Phospho).

Backbone Me0 Me1 Me2 Me3 Me4 Me5 Me6 Me7 ...

Lys-PA0 147.1128 161.1285 175.1442 189.1599 — — — — ...

Lys-PA1 204.1706 218.1863 232.2020 246.2177 260.2334 274.2491 — — ...

Lys-PA1-PA2 261.2284 275.2441 289.2598 303.2755 317.2912 331.3069 345.3226 359.3383 ...

... ... ... ... ... ... ... ... ... ...

Hydroxy-Lys-PA0 163.1077 177.1234 191.1391 205.1548 — — — — ...

Hydroxy-Lys-PA1 220.1655 234.1812 248.1969 262.2126 276.2283 290.2440 — — ...

Hydroxy-Lys-PA1-PA2 277.2233 291.2390 305.2547 319.2704 333.2861 347.3018 361.3175 375.3332 ...

... ... ... ... ... ... ... ... ... ...

Phospho-Hydroxy-Lys-PA0 243.0740 257.0897 271.1054 285.1211 — — — — ...

Phospho-Hydroxy-Lys-PA1 300.1318 314.1475 328.1632 342.1789 356.1946 370.2103 — — ...

Phospho-Hydroxy-Lys-PA1-PA2 357.1896 371.2053 385.2210 399.2367 413.2524 427.2681 441.2838 455.2995 ...

... ... ... ... ... ... ... ... ... ...

Table 3.1 Calculated m/z values of singly-protonated molecular species for ε-polyamine struc-tures from Fig. 3.1. Accounted for singly-charged molecular species. Furthermore, each struc-ture was supplemented with respective number of derivatization groups (N×QAC) attachedto primary and secondary amines (shown in Table A.1). Propylamine units (PA0, PA1, PA2);N-methyl groups (Me1–Me7); δ-hydroxylation of lysine (Hydroxy); phosphorylation of sidehydroxyl (Phospho).

bone). As discussed in Section 1.4.2, AQC reagent quantitatively derivatizes primary

and secondary amines [127]. Consequently, the corresponding number of attached

derivatization groups (N×QAC) indicates how many non-methylated nitrogens are

present in the ε-polyamine chain. The N×QAC groups is therefore helpful to resolve

ambiguities between multiple isobaric molecules, allowing discrimination of struc-

tural isomers including polyamine chains of varying structures. The resulting QAC-

derivatives can be quantified by the integration of extracted-ion-chromatogram (XIC)

40 results and discussion

peaks of their protonated molecular species. LC separation of QAC-derivatives is fol-

lowed by high resolution mass spectrometry (HRMS), which can be targeted to precal-

culated m/z values of anticipated ε-polyamine lysines (from Table 3.1) with the corre-

sponding number of QAC moieties attached. Tentatively assigned polyamines could be

further confirmed by LC-MS/MS analysis, thus providing detailed information regard-

ing both relative abundance and structure of covalently polyamine-modified lysines.

Finally, upon the MS/MS fragmentation QAC-derivatives generate the pronounced

fragment of m/z 171.0564, which could be used for multiple reaction monitoring (MRM)

experiments [130, 131].

However, the different number of QAC groups (N×QAC) attached to polyamine

derivatives may affect their ionization efficiency, thus biasing the instrument response.

Therefore, having an established analytical method at hand, we examined its applica-

bility for the analysis of synthetic and commercially available standards that contain

different number of primary and secondary amines (Section 3.1.2). To further demon-

strate the quantification accuracy, we applied the entire workflow to well-characterized

biosilica-associated protein silaffin-3 from T. pseudonana [67, 68] (Section 3.1.3).

3.1.2 Evaluation of the method applicability for profiling of lysine ε-polyamines

In order to investigate the response factors for lysine derivatives, the calibration curves

for molecules reacted with different number of derivatization groups (N×QAC) were

build. Additionally, method evaluation addressed the completeness of derivatization

reaction and stability of resulting QAC-derivatives. For this purpose stock solutions

of synthetic and commercially available standards with different number of primary

and secondary amines were used (all compounds are listed in Table 5.1 and Sec-

tion 5.1 of Materials and Methods). The structure and corresponding number

of QAC-groups attached to each of the analytical standards are depicted in Fig. 3.2.

The synthetic ornithine- and lysine-based ε-polyamines, or post-translational modifi-

cation (PTM) 275-orn and PTM 289 respectively1 (Fig. 3.2a and 3.2b), were synthe-

1 For reader convenience here and further all the lysine derivatives in the text are annotated with m/zvalues of singly protonated molecular ions. Similarly, QAC-derivatized molecules are denoted with them/z value following by the respective number of QAC moieties attached in parentheses (N×QAC). InSection 3.3 lysine PTMs mapped to the protein sequence are denoted with the nominal m/z value forsimplicity.

3.1 a method for analysis of ε-polyamine ptms 41

sized by Marina Abacilar in Armin Geyer laboratory (Philipps-Universität, Mar-

burg, Germany). The polyamine chain of these molecules corresponds to the abun-

dant lysine modification, which has been characterized previously in protein silaffin-3

from T. pseudonana (tpSil3) [67, 68]. The use of ornithine for internal standard deriva-

tive is explained by its abundance in diatom biosilica (concentrations are three orders

of magnitude lower than lysine, unpublished results). Much like spermidine poly-

amine (Fig. 3.2c), they react with three AQC molecules resulting in 3×QAC-derivatives.

Standards of unmodified lysine (Fig. 3.2d), ε-N-monomethyllysine (Fig. 3.2e), δ-hy-

droxylysine (Fig. 3.2f) accepted 2×QAC moieties, whereas ε-N,N,N-trimethyllysine

(Fig. 3.2h), ε-N,N-dimethyllysine (Fig. 3.2g), arginine (Fig. 3.2i), and proline (Fig. 3.2j)

were 1×QAC-derivatized.

Previously, ε-polyamine chains were demonstrated to be stable towards acidic hy-

drolysis [57]. At the same time, primary and secondary amines have different reaction

rates [122, 127]. Therefore, partial or incomplete derivatization of polyamines may bias

the quantification accuracy. To ensure the completeness of derivatization, the synthetic

polyamine-modified lysine consisting of two propylamine units with a dimethylated

terminal amine attached to ε-amine was used (PTM 289, Fig. 3.2b). The complete-

ness of derivatization reaction was assessed by the amount of incompletely derivatized

ε-polyamine, which did not exceed 1 % of total standard amount. QAC-derivatives

are stable at room temperature, while the excess reagent does not affect the analysis.

Moreover, during RPLC separation the sample is cleaned up with the mobile phase,

thus eliminating ion-suppression effect from borate buffer, which is poorly compatible

with ESI-mass spectrometry (MS). The QAC-derivatized ε-polyamine-modified lysine

was stable at 4 C in borate buffer within a week with decomposition degree less than

10 %; nevertheless, immediate analysis right after derivatization was preferred. Finally,

the completeness of AQC-derivatization of polyamines in crude biosilica hydrolysates

was confirmed by using ornithine-based internal standard (PTM 275-orn, see Fig. 3.2a),

which was spiked into each sample prior to the analysis.

Calibration curves produced for 2×QAC and 3×QAC-derivatives were linear within

three orders of magnitude (1000-fold dynamic range), demonstrating similar response

factors (see Fig. 3.2). Therefore no individual standards for each type of lysine modi-

fications because the instrument response is controlled by the number of QAC groups

attached. Therefore, it is safe to consider that all 2×QAC and 3×QAC (and presum-

42 results and discussion

1.0E+05

1.0E+06

1.0E+07

1.0E+08

1.0E+09

1.0E+10

1.0E+11

0.01 0.10 1.00 10.00 100.00

(a) PTM 289 R2=0.9943

(b) PTM 275 R2=0.9967

(c) PTM 146 R2=0.9925

(f) PTM 147 R2=0.9949

(e) PTM 161 R2=0.9980

(d) PTM 163 R2=0.9993

(h) PTM 175 R2=0.9929

(g) PTM 189 R2=0.9979

(i) PTM 173 R2=0.9880

(j) proline R2=0.9959

QA

C2

×Q

AC

QA

C

1×QAC

2×QAC

3×QAC

amount (loaded on-column),

log-scale, pmol

instrument response,

log-scale, a.u.

3×QAC

2×QAC

1×QAC

Calibration curves (logarithmic scale, Y - arbitrary abundance units, X – amount loaded on-column, pmol)

+ +

(a) PTM 275-orn (3×QAC)

+ +

(b) PTM 289 (3×QAC)

+ +

(c) PTM 146 (3×QAC)

+

+

(d) lysine (2×QAC)

+

+

(e) PTM 161 (2×QAC)

+

+

(f ) PTM 163 (2×QAC)

+

+

(g) PTM 175 (1×QAC)

+

+

(h) PTM 189 (1×QAC)

+

(i) arginine (1×QAC)

+

(j) proline (1×QAC)

Figure 3.2 Calibration curves of standard compounds with a different number of QAC moi-eties attached: (a) polyamine-modified ornithine (PTM 275-orn); (b) polyamine-modified ly-sine PTM 289; (c) spermidine; (d) unmodified lysine; (e) ε-N-monomethyllysine (PTM 161); (f)δ-hydroxylysine (PTM 163); (g) ε-N,N-dimethyllysine (PTM 175); (h) ε-N,N,N-trimethyllysine(PTM 189); (i) arginine; (j) proline; QAC, derivatization group.

3.1 a method for analysis of ε-polyamine ptms 43

ably 4×QAC, which were not tested) compounds demonstrate similar response and

no correction factors are required. At the same time, the 1×QAC lysines demon-

strated approximately ×10 lower response factors (see Fig. 3.2a). Consequently, abun-

dances of 2×QAC and 3×QAC-derivatives, calculated from XICs, can be normalized

to the spiked ornithine-based internal standard (Fig. 3.2a), while 1×QAC-derivatized

molecules have to be quantified via external calibration. Additionally, calibration

curves were produced for 17 physiological amino acids in order to determine their

content in protein hydrolysates (Fig. A.2). 1×QAC-derivatized amino acids displayed

dynamic ranges of four orders of magnitude, whereas a few amino acids exhibited

linearity over two orders of magnitude.

Therefore, the established method is applicable for quantitative analysis of lysine

derivatives. The major advantages provided by this method include enhanced sensi-

tivity and selectivity to all the covalently modified lysines. To further demonstrate

the method applicability for analysis of crude biosilica hydrolysates, we applied the

entire workflow for characterization of lysine modifications in the purified biosilica-

associated protein tpSil3.

3.1.3 Profiling of lysine modifications in silaffin-3

As discussed in Section 1.3.2, protein silaffin-3 from T. pseudonana (tpSil3) is a well-

characterized component of its biosilica extract with highly complex modifications [67,

68]. Therefore, tpSil3 was purified from T. pseudonana biosilica extract according to the

protocol developed by Poulsen and Kröger (refer to Section 5.2 or [67]). ESI-MS of

underivatized acidic hydrolysate in the positive ion mode revealed two abundant peak

clusters with m/z 319.2704, 333.2860, 347.3017, and lesser amounts of m/z 275.2442,

289.2598, 303.2755, 317.2911, which displayed the mass difference of 14 Da between

neighbouring peaks that corresponds to CH2 unit (Fig. 3.3). These masses fit to lysines

modified with two propylamine units with different number of N-methyl groups (see

in Table 3.1), which also agreed with previously published data [67]. These chemical

structures were confirmed with high-resolution MS/MS spectra with 3 ppm accuracy

(refer to Section 3.2.1 and Fig. A.13–A.21).

44 results and discussion

260 280 300 320 340 360 380 400 420 440

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

431.2572

C 13 H35 O 8 N8

-0.0342 mmu

399.2401

C 20 H36 O 4 N2 P

-0.6027 mmu

289.2597

C 14 H33 O 2 N4

-0.1270 mmu

303.2753

C 15 H35 O 2 N4

-0.1204 mmu

333.2859

C 16 H37 O 3 N4

-0.0986 mmu

319.2703

C 15 H35 O 3 N4

-0.0792 mmu

347.3016

C 17 H39 O 3 N4

-0.0932 mmu

413.2524

C 16 H38 O 6 N4 P

0.0653 mmu275.2442

C 13 H31 O 2 N4

0.0378 mmu

317.2911

C 16 H37 O 2 N4

-0.0254 mmu

Figure 3.3 MS-spectrum of acidic hydrolysate of silaffin-3 from T. pseudonana (tpSil3)

Next, HCl hydrolysate of the purified tpSil3 was subjected to AQC-derivatization.

Resulting QAC derivatives were injected to LC-MS for quantification of modified lysines

and other amino acids from their XICs signals (see Section 5.6 for experimental de-

tails). The amino acids were quantified from calibration curves (Fig. A.2), while the

molar amounts of ε-polyamine lysine derivatives were calculated proportionally to the

amount of ornithine-based internal standard (PTM 275-orn, Fig. 3.2a). The calculated

molar amounts of both amino acids and ε-polyamines were normalized to the total

amount of all QAC-derivatives. To ensure correctness of the results obtained from this

experiment, the same sample of tpSil3 was analyzed for amino acid content using UV-

detection at 280 nm (refer to Section 5.7). Both results were compared to theoretical

amino acid composition of tpSil3. The full profile of lysine modifications and amino

acid content of this protein are displayed in Fig. 3.4.

The polypeptide chain of tpSil3 contains a total of 33 lysine residues that corre-

spond to 16 % (of the total amino acid content), while less than 5 % of free lysines

were detected (see Fig. 3.4). At the same time, relative abundances of other amino

acids corroborated tpSil3 database sequence. Under the conditions of acidic hydrolysis

(6 m HCl, 16–24 h at 110 C) some amino acid residues undergo oxidation or complete

3.1 a method for analysis of ε-polyamine ptms 45

0%

10%

20%

30%

40%

Arg His Ser Gly Asx Glx Thr Ala Pro Val Leu Ile Phe Tyr Met Lys 175 261 275 289 303 319 333 399 413

MS-detection UV-detection Theoretical

rela

tve

ab

un

da

nc

e, %

0%

5%

10%

15%

Lys 175 261 275 289 303 319 333 399 413

(b)(a) (c) (d) (e) (f) (g) (h) (i)

rela

tve

ab

un

da

nc

e, %

amino acid analysis (AAA) and lysine PTMs profile of tpSil3 by MS- and UV-detection.

+

+

(a) PTM 175 (3×QAC)

+

+ +

(b) PTM 261 (3×QAC)

+ + +

(c) PTM 275 (4×QAC)

+ +

(d) PTM 289 (3×QAC)

+ +

(e) PTM 303a (2×QAC)

+ +

(f ) PTM 319 (3×QAC)

+ +

(g) PTM 333 (2×QAC)

+ +

(h) PTM 399 (3×QAC)

+ +

(i) PTM 413 (2×QAC)

Figure 3.4 Amino acid content and lysine PTMs profile of tpSil3 by MS- and UV-detection.Validation of the developed method with the analysis of silaffin-3 from T. pseudonana (tpSil3).Only 25 % of free lysines were detected. About 75 % of total lysine content is modified withdifferent ε-modifications, displayed in (a)–(i). Asx, Aspartic acid or Asparagine; Glx, Glutamicacid or Glutamine; QAC, derivatization group.

46 results and discussion

degradation [113]. Upon HCl hydrolysis asparagine and glutamine were converted to

aspartic acid and glutamic acid, respectively, and therefore detected as sum of both

(Asx and Glx in Fig. 3.4). Tryptophan completely degraded, whereas methionine and

cysteine cannot be directly determined from the hydrolyzed samples due to oxida-

tion, and therefore were not quantified. Serine, tyrosine, and threonine are partially

hydrolyzed. Thus, considering the stable amino acid residues, both LC-MS and UV-

detection demonstrated collaborating results for amino acid determination, however

UV-detection failed to distinguish different lysine modifications species due to the lack

of corresponding standards. In contrast, the developed MS-based method clearly indi-

cated the presence of differently modified lysines. Therefore, we concluded that total

amount of modified lysines corresponded to 80 %, which was in a good agreement with

previously reported data [67, 68]. Side-chain polyamines of the detected lysine modifi-

cations vary in number of propylamine units and N-methylation groups. The number

of derivatization groups (N×QAC) attached to primary and secondary amines of mod-

ified lysines was in accordance with fragment spectra of non-derivatized molecules

(e. g. for PTM 303a, Fig. 3.4), which will be discussed in detail in Section 3.2.1.

3.2 profiling lysine ε-polyamine modifications in

biosilica extracts from three diatom species

Silaffin proteins appear to be permanently associated with (or embedded within) the

biosilica, as they are not extracted from diatom cell walls under rigorous extraction

conditions (e.g., 2 % SDS at 95 C, 8 m urea, or 6 m guanidinium·HCl), as long as the

silica remains intact (discussed in Section 1.3.1). To increase the accessibility of silaffins

we employed the protocol by Kröger et al. [66], which was used previously for the

characterization of LCPA, silacidins, and other biosilica-embedded components [66, 67,

72, 83]. Briefly, acidified ammonium fluoride solution (pH 4.5) was used for solubiliza-

tion of the biosilica-associated proteins, therefore termed ammonium fluoride soluble

material (AFSM).

The AFSM extracts from three diatom species were hydrolyzed with HCl. The ly-

sine modifications from total acidic hydrolysates were fragmented via two alternative

3.2 profiling lysine ptms in biosilica extracts 47

ways: (a) directly in HCl hydrolysate without any pre-fractionation, and (b) in course

of LC-MS/MS run with pre-column AQC-derivatization. The lysine ε-polyamine struc-

tures were obtained by combining information derived from both types of spectra,

which are given in Appendix (Fig. A.9–A.25). Each MS/MS spectrum was interpreted

and annotated with compositions and chemical structures. Additionally, the number

of derivatization adducts (N×QAC) helped to resolve isobaric molecular species using

RPLC. The full catalogue of the detected lysine PTM structures from the three diatom

species is summarized in Table 3.2.

3.2.1 Lysine PTM profile and characteristic fragments

Prior to the analysis of lysine ε-polyamines in total biosilica hydrolysates, the fragmen-

tation of lysine- and ornithine-based standards bearing δ-polyamine modifications was

investigated (mass spectra are shown in Fig. 3.6). The MS/MS of ornithine derivative

with exact m/z 275.2442 (PTM 275-orn) revealed the series of fragments that corre-

sponds to the fragmentation of propylamine chain, which corroborated fragmentation

pattern of lysine ε-polyamine modification (cf. Fig. 3.6a and Fig. 3.6b). Cleavage po-

sitions that lead to the observed fragment ions are depicted by in Fig. 3.5, where

the corresponding fragments are indicated as m- and n-ion series. Optimal normalized

collision energies (nCE) required for the complete fragmentation of both polyamine

standards were about 25 % to 35 %, which was applied for fragmentation of other mod-

ified lysines. Additionally, the pronounced H2O-losses were detected for ornithine

fragments, where the formation of stable six-membered lactam ring occurred (the so-

called ornithine effect [158]), which was not observed for lysine ε-polyamine standard.

This hallmark fragments can be used to distinguish further lysine and ornithine poly-

amine modifications, if they would be present in diatom biosilica hydrolysates.

Next, the acidic hydrolysates of biosilica extracts from the three diatoms were sub-

jected to direct infusion MS/MS analysis. Altogether, 20 m/z values were detected

within 3 ppm accuracy that were matched to ε-polyamine-modified lysine structures.

In Table 3.2 each structure is referred to as ‘PTM’ with a nominal m/z for underiva-

tized singly-charged ion (the exact m/z are also provided). In order to validate the

chemical structures of underivatized molecular species higher-energy collisional disso-

48 results and discussion

+ + +

m2

m1

m3 m4 m5

n2

n1

n3 n4 n5

-98 Da

-78 Da

Figure 3.5 Schematic diagram of m- and n-ions representing fragmentation of ε-polyaminechain (R1–R7 = H or CH3). For certain lysine modifications containing δ-hydroxyl phosphoryla-tion the H3PO4 (−98 Da) or HPO3-loss (−80 Da) neutral losses can be observed.

ciation (HCD) MS/MS was produced for m/z. Direct infusion MS/MS analysis revealed

the presence of multiple compounds with isobaric molecular masses. This approach,

being simple and rapid, appeared to be insufficient for the analysis of complex biosilica

matrices due to the presence of several structural isomers, i. e. PTMs 303, 317, and 331.

The direct infusion analysis did not allow their discrimination, and, consequently, it

was not possible to unambiguously validate the corresponding structures with MS/MS

spectrum. Additionally, direct MS analysis could be difficult for the identification and

quantification of these lysine PTMs due to significant differences in ionization efficien-

cies of polyamine molecules, which would require to use analytical standards for each

individual compound.

To overcome this issue, LC-MS/MS was performed, which resolved and fragmented

separately the isobaric species. After pre-column derivatization each isoform reacted

with a different number of AQC molecules, corresponding to the total number of pri-

mary and secondary amines present in the structure. In Table 3.2 next to each m/z value

for underivatized species the corresponding number of derivatization adducts is pro-

vided (N×QAC, highlighted in gray). Isobaric molecular species with different N×QAC

groups were well chromatographically resolved and fragmented separately, thus pro-

viding an independent confirmation for isomers and other structures. For instance,

PTM 303 in Fig. 3.7. Similarly, PTMs 317 and 331 in Fig. A.15–A.18. Additionally, the

RPLC analysis of QAC-derivatized species revealed the presence of five low-abundant

lysine modifications (PTMs 204, 218, 246, 248, and 261), which are present in biosil-

ica hydrolysates from all three diatom species. These modifications were tentatively

3.2 profiling lysine ptms in biosilica extracts 49

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

257.2320

C 13 H29 O N4

-1.6181 mmu

86.0966

C 5 H12 N

0.1342 mmu

155.1170

C 8 H15 O N2

-0.9323 mmu

212.1745

C 11 H22 O N3

-1.2842 mmu

239.2215

C 13 H27 N4

-1.5258 mmu275.2423

C 13 H31 O 2 N4

-1.8458 mmu

116.0702

C 5 H10 O 2 N

-0.3935 mmu

143.1535

C 8 H19 N2

-0.7936 mmu

173.1275

C 8 H17 O 2 N2

-0.9812 mmu

160.1798

C 8 H22 N3

-1.0047 mmu

230.1863-H₂O

-18.0103

-H 2 O

-0.2277 mmu

-18.0105

-H 2 O

-0.0489 mmu

-18.0105

H2 O

-0.0923 mmu

103.1228

C 5 H15 N2

-0.7113 mmu

+

143.1543

160.1808

116.0706

230.1863

86.0964

173.1285

103.1230

(a) Fragment spectrum of synthetic ornithine δ-polyamine derivative PTM 275-orn (m/z 275.2442; 1+)

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

86.0966

C 5 H12 N

0.1269 mmu

289.2579

C 14 H33 O 2 N4

-1.8793 mmu

98.0963

C 6 H12 N

-0.0769 mmu

187.1430

C 9 H19 O 2 N2

-1.1142 mmu244.2005

C 12 H26 O 2 N3

-1.4961 mmu

143.1535

C 8 H19 N2

-0.8149 mmu 271.2475

C 14 H31 O N4

-1.7456 mmu

170.1165

C 9 H16 O 2 N

-1.0283 mmu226.1900

C 12 H24 O N3

-1.4082 mmu

103.1228

C 5 H15 N2

-0.1652 mmu

130.0856

C 6 H12 O 2 N

-0.6308 mmu

160.1799

C 8 H22 N3

-0.9331 mmu

-18.0104

-H 2 O

-0.1336 mmu

-18.0105

-H 2 O

-0.0879 mmu

-17.0265

-H3 N

-0.0859 mmu

+

143.1543

244.2020

86.0964

187.1441

103.1230160.1808

130.0863

(b) Fragment spectrum of synthetic lysine ε-polyamine derivative PTM 289-lys (m/z 289.2598; 1+)

Figure 3.6 HCD MS/MS spectra (nCE to 30 %) and chemical structures of synthetic standardsof oligo-propylenediamine-substituted ε-lysine and δ-ornithine derivatives used for validationof the method and as internal standards: (a) ornithine δ-polyamine derivative (m/z 275.2442;1+); (b) lysine ε-polyamine derivative (m/z 289.2598; 1+). Spectra are annotated with accuratemasses, calculated chemical composition (CHNO) and delta mass (in mmu). Lysine-specificimmonium ions at m/z 84.0808 and m/z 129.1022 are not annotated. The pronounced H2O-losses that were observed during fragmentation of PTM 275-orn are characteristic for ornithineδ-polyamines, where the formation of stable six-membered lactam with m/z 257.2320 occurs(the so-called ornithine effect [158]), which has not been observed for lysine ε-polyamines.

50 results and discussion

assigned with a corresponding chemical structure based on the accurate mass and

N×QAC derivatization moieties (listed in Table 3.2).

The fragmentation of mono-, di- and trimethylated lysine derivatives results in

lysine-specific immonium ions with a moderate intensity at m/z 115.1229, 129.1386 and

143.1543, respectively. Also, the immonium ion with m/z 130.0867 has been observed

for all methylated lysine species (Fig. A.9–A.11). A specific characteristic ion for mono-

methylated lysine was observed at m/z 98.0964, which corresponds to the immonium

ion (IM)-NH3 ion (Fig. A.9). These results are in a good accordance with previously

published data on lysine methylation [138, 159–161].

In addition to the lysine-specific fragment ions at m/z 84.0808 and m/z 129.1022, sev-

eral specific marker ions can be detected for ε-polyamine chains. For instance, MS/MS

spectra of PTMs 413 and 333 (phosphopolyamine and its non-phosphorylated counter-

part, correspondingly) contain the pronounced fragment with the exact m/z 143.1543

(refer to Fig. A.21 and Fig. A.24). As shown in MS/MS spectra from tpSil3 peptides in

Fig. 3.18, the fragmentation of ε-polyamine-modified lysine (PTM 289) also results in

formation of the abundant fragment ion of m/z 143.1543. Similarly, this characteristic

ion occurs in spectra of PTMs 303b and 317, which have the same ε-polyamine chain

structure (Fig. 3.7 and A.16). The fragmentation of ε-polyamines PTMs 317a (isomer

modified by 1×QAC, Fig. A.15), PTMs 331a and 331b (isomers with 1 and 2×QAC re-

spectively, Fig. A.17 and A.18), PTM 347 (Fig. A.22) resulted in characteristic fragment

with m/z 157.1699, whereas lysine derivatives PTM 275 (Fig. A.13), PTM 303b (isomer

with 3×QAC, Fig. 3.7), PTM 319 (Fig. A.20) demonstrated the presence of m/z 129.1386

diagnostic ion.

3.2 profiling lysine ptms in biosilica extracts 51

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

303.2753

C 15 H35 O 2 N4

-0.1563 mmu

143.1543

C 8 H19 N2

0.0413 mmu

187.1441

C 9 H19 O 2 N2

-0.0050 mmu

258.2176

C 13 H28 O 2 N3

-0.0391 mmu

157.1700

C 9 H21 N2

0.0372 mmu

98.0969

C 6 H12 N

0.5141 mmu

130.0864

C 6 H12 O 2 N

0.1199 mmu

214.2277

C 12 H28 N3

-0.0580 mmu

285.2647

C 15 H33 O N4

-0.1906 mmu

86.0971

C 5 H12 N

0.6383 mmu

201.1596

C 10 H21 O 2 N2

-0.1628 mmu

+

+

143.1543

+

130.0863

130.0863

130.0863

157.1699

157.1699

258.2176

258.2176187.1441

187.1441

201.1598

232.2020

117.1386

117.1386

(a) Fragment spectrum of underivatized isomeric lysine ε-polyamines PTM 303 (m/z 303.2755; 1+)

100 150 200 250 300 350 400 450 500 550 600 650 700

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0550

C 10 H7 O N2

-0.3120 mmu

303.2747

C 15 H35 O 2 N4

-0.7031 mmu

237.1649

C 3 H17 O N12

0.5762 mmu473.3228

C 25 H41 O 3 N6

-0.6507 mmu

143.1540

C 8 H19 N2

-0.2530 mmu

201.1593

C 10 H21 O 2 N2

-0.4222 mmu

540.1858

C 28 H30 O 10 N

-0.6315 mmu

321.2802

C 21 H37 O 2

1.3445 mmu

388.2333

C 19 H34 O 7 N

0.3375 mmu

QAC

+ +

473.3235

143.1543

201.1598

388.2343

(b) Fragment spectrum of 2×QAC-derivatized lysine modification PTM 303a (m/z 322.1893; 2+)

Figure 3.7 HCD MS/MS spectra of isomeric lysine derivatives PTM 303. (a) spectrum ofunderivatized isomers (m/z 303; 1+), nCE to 30 %; (b) spectrum of 2×QAC-derivatized moleculePTM 303a (m/z 322.1893; 2+), nCE to 30 %. Fragment peaks are annotated with an accurate mass,corresponding calculated chemical composition (CHNOP) and delta mass (in mmu).

52 results and discussion

QAC

100 150 200 250 300 350 400 450 500 550 600 650 700

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0549

C 10 H7 O N2

-0.3478 mmu

242.1282

C 14 H16 O N3

-0.5558 mmu

299.1859

C 16 H27 O 5

0.6164 mmu

357.1912

C 18 H29 O 7

0.3882 mmu117.1387

C 6 H17 N2

0.0563 mmu

473.3226

C 24 H45 O 7 N2

0.4814 mmu

214.1909

C 11 H24 O N3

-0.4598 mmu

643.3691

C 34 H51 O 8 N4

-1.0628 mmu

402.2491

C 20 H36 O 7 N

0.4291 mmu

+

643.3715

473.3235

527.2401

357.1912

242.1288

402.2499

117.1386

(c) Fragment spectrum of 3×QAC-derivatized lysine modification PTM 303b (m/z 407.2133; 2+)

100 150 200 250 300 350 400 450 500 550 600 650 700

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0550

C 10 H7 O N2

-0.3334 mmu

303.2748

C 15 H35 O 2 N4

-0.6906 mmu

237.1649

C 3 H17 O N12

0.6046 mmu

473.3226

C 24 H45 O 7 N2

0.4740 mmu

388.2333

C 19 H34 O 7 N

0.3496 mmu

157.1697

C 9 H21 N2

-0.2324 mmu

187.1438

C 9 H19 O 2 N2

-0.3192 mmu

QAC

+

473.3235

157.1699 187.1441

388.2343

(d) Fragment spectrum of 2×QAC-derivatized lysine modification PTM 303c (m/z 322.1893; 2+)

Figure 3.7 HCD MS/MS spectra of lysine derivative PTM 303b and PTM 303c (continuedfrom previous page). (d) spectrum of 3×QAC-derivatized lysine modification PTM 303b, nCEto 30 %; (c) spectrum of 2×QAC-derivatized lysine modification PTM 303c, nCE to 30 %. Frag-ment peaks are annotated with an accurate mass, corresponding calculated chemical composi-tion (CHNOP) and delta mass (in mmu).

3.2 profiling lysine ptms in biosilica extracts 53

Table 3.2 Catalogue of lysine modifications and their characteristic fragments. Cleavage po-sitions that lead to the observed fragment ions are depicted by . Lysine-specific fragments atm/z 84.0808 and m/z 129.1022 are not listed.

m/z (n×qac) ptm structure and reporter fragments spectrum

Ornithine-based internal standard

PTM 275-orn

(standard)

m/z 275.2442

(3×QAC)

+

143.1543

160.1808

116.0706

230.1863

86.0964

173.1285

103.1230

characteristic fragments

Fig. A.8

p. 126

ε-methylated lysines (Fig. 3.10b)

PTM 161

m/z 161.1285

(2×QAC)

+

98.0964, 115.1229

Fig. A.9

p. 127

PTM 175

m/z 175.1441

(1×QAC)

+

129.1386

Fig. A.10

p. 128

(+28

K in Fig. 3.21b)

PTM 189

m/z 189.1598

(1×QAC)

+

144.1388

Fig. A.11

p. 129

(+42

K in Fig. 3.21c)

54 results and discussion

Table 3.2 Catalogue of lysine modifications (continued from previous page)

m/z (n×qac) ptm structure and reporter fragments spectrum

ε-polyaminated lysines (Fig. 3.10c)

PTM 204

m/z 204.1707

(3×QAC)

+

low-abundant derivative

no MS/MS

PTM 218

m/z 218.1863

(3×QAC)

+

low-abundant derivative

no MS/MS

PTM 232

m/z 232.2020

(2×QAC)+

201.1598130.0863

103.1230

161.1285

characteristic fragments

Fig. A.12

p. 130

(+85

K in Fig. 3.21f)

PTM 246

m/z 246.2176

(2×QAC)

+

low-abundant derivative

no MS/MS

PTM 261

m/z 261.2285

(4×QAC)

+

low-abundant derivative

no MS/MS

3.2 profiling lysine ptms in biosilica extracts 55

Table 3.2 Catalogue of lysine modifications (continued from previous page)

m/z (n×qac) ptm structure and reporter fragments spectrum

PTM 275

m/z 275.2442

(4×QAC)

+

129.1386

characteristic fragments

Fig. A.13

p. 131

PTM 289

m/z 289.2598

(3×QAC)

+

143.1543

244.2020

86.0964

187.1441

103.1230160.1808

130.0863

143.1543

Fig. A.14

p. 132

(+142

K in Fig. 3.21d)

PTM 303a

m/z 303.2755

(2×QAC)

+

143.1543

130.0863 258.2176201.1598

characteristic fragments

Fig. 3.7b

p. 51

PTM 303b

m/z 303.2755

(3×QAC)

+

130.0863

157.1699

187.1441

232.2020

117.1386

characteristic fragments

Fig. 3.7c

p. 52

56 results and discussion

Table 3.2 Catalogue of lysine modifications (continued from previous page)

m/z (n×qac) ptm structure and reporter fragments spectrum

PTM 303c

m/z 303.2755

(2×QAC)

+

473.3235

157.1699 187.1441

388.2343

characteristic fragments

Fig. 3.7d

p. 52

PTM 317a

m/z 317.2911

(1×QAC)

157.1699

+

130.0863 272.2333201.1598

characteristic fragments

Fig. A.15

p. 133

PTM 317b

m/z 317.2911

(2×QAC)

+

143.1543

130.0863 215.1754 272.2333

characteristic fragments

Fig. A.16

p. 134

PTM 331a

m/z 331.3068

(1×QAC)

+

157.1699

characteristic fragments

Fig. A.17

p. 135

PTM 331b

m/z 331.3068

(2×QAC)

+ +

157.1699

characteristic fragments

Fig. A.18

p. 136

3.2 profiling lysine ptms in biosilica extracts 57

Table 3.2 Catalogue of lysine modifications (continued from previous page)

m/z (n×qac) ptm structure and reporter fragments spectrum

δ-hydroxylated lysines (Fig. 3.10d)

PTM 163

m/z 163.1077

(2×QAC) +

characteristic fragments

??

no MS/MS

PTM 205

m/z 205.1547

(1×QAC)

+

characteristic fragments

Fig. A.19

p. 137

PTM 248

m/z 248.1969

(2×QAC)

+

characteristic fragments

no MS/MS

PTM 319

m/z 319.2704

(3×QAC)

+

129.1386

characteristic fragments

Fig. A.20

p. 138

PTM 333

m/z 333.2860

(2×QAC)

+

143.1543

188.2121

143.1543

Fig. A.21

p. 139

(+186

K in Fig. 3.21e)

58 results and discussion

Table 3.2 Catalogue of lysine modifications (continued from previous page)

m/z (n×qac) ptm structure and reporter fragments spectrum

PTM 347

m/z 347.3017

(2×QAC)

+ +

157.1699

characteristic fragments

Fig. A.22

p. 140

Phosphopolyamines (Fig. 3.10e)

PTM 399

m/z 399.2367

(3×QAC)

+

129.1386

characteristic fragments

Fig. A.23

p. 141

PTM 413

m/z 413.2523

(2×QAC)

+

143.1543

143.1543

Fig. A.24

p. 142

PTM 427

m/z 427.2680

(2×QAC)

+ +

157.1699

characteristic fragments

Fig. A.25

p. 143

An important outcome of the fragmentation study is the discovery of characteris-

tic ions, present in fragmentation spectra of the ε-polyamine modifications. These

3.2 profiling lysine ptms in biosilica extracts 59

fragments are diagnostic for modified lysine residues and can be used for further

peptide-independent and modification-specific detection of ε-polyamine PTMs. De-

tected characteristic ions preferentially represent ε-side chain fragments or IMs, whose

m/z are summarized in Table 3.2 (below each structure). According to the MS/MS of

other lysine modifications, the majority of modification-specific fragments refer to the

low-mass region between m/z 50 and 300, and therefore could be detected with high

resolution and sub-ppm mass accuracy (refer to spectra in Fig. A.8–A.25). Information

about these diagnostic modification-specific ions is very important for identification of

modified peptides and for the validation of MS/MS-based PTM assignments, which

will be discussed further in Section 3.3.5.

The accurate masses and MS/MS spectra of six lysine modifications corresponded

to phosphorylated δ-hydroxylysines with ε-polyamine chains, and its corresponding

non-phosphorylated counterparts with the mass shift of 80 Da for HPO3 (PTMs 399,

319, 413 and PTMs 333, 427 and 347 respectively). However, it is highly surprising

that O-phosphoester bond can be resistant upon exhaustive acidic hydrolysis2 (6 N

HCl, 24 h at 110 C). In the current work, investigation of these structures, collectively

denoted as phosphopolyamines, was addressed with elaborate structural study, which

will be discussed in the following section (Section 3.2.2).

3.2.2 Elucidation of phosphopolyamine structures resistant to acidic hydrolysis

Profiling of biosilica hydrolysates and purified tpSil3 revealed the presence of QAC-

derivatized phosphopolyamines, which eluted earlier than their non-phosphorylated

counterparts (see Fig. A.5). Two alternative structures were proposed, which contained

either phosphoester bond (C O P, Fig. 3.9b) or phosphonate group (C P , Fig. 3.9c).

Both structures were consistent with MS/MS spectrum of the PTM 413 (m/z 413.2523,

Fig. 3.8a), however the observed intensity corresponding to the H3PO4 neutral loss

(−98 Da) was unexpectedly weak. To test the C P bond assumption, total biosilica hy-

drolysate was derivatized with acetic anhydride to detect the corresponding mass shifts

for acetylated species (see Section 5.9). The resulting MS/MS spectrum acquired from

the doubly acetylated derivative of m/z 497.2735 and demonstrated in Fig. 3.8b, how-

2 On the other hand, phosphorylated structures were completely converted to non-phosphorylated onesby HF-treatment (Fig. A.6).

60 results and discussion

100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

Re

lative

Ab

un

da

nce

143.1543C 8 H19 N2

-0.0078 mmu

188.2120C 10 H26 N3

-0.0819 mmu

413.2519C 16 H38 O6 N4 P

-0.4093 mmu98.0969C 6 H12 N

0.4630 mmu 333.2855C 16 H37 O3 N4

-0.4721 mmu

271.1051C 11 H11 O N8

0.0239 mmu

86.0970C 5 H12 N

0.0589 mmu

315.2751C 16 H35 O2 N4

-0.8737 mmu

+

-79.9664-H O3 P

0.0660 mmu

-97.9768-H 3 O4 P

-0.0885 mmu

271.1051

188.2120

143.1543 86.0970

(a) MS/MS of underivatized phosphopolyamine (m/z 413.2523)

+

100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

Re

lative

Ab

un

da

nce

185.1644C 10 H21 O N2

-0.9816 mmu

86.0965C 5 H12 N

-0.4393 mmu

114.0912C 6 H12 O N

-0.6790 mmu

497.2723C 16 H38 O6 N10 P

0.9930 mmu

417.3061C 16 H37 O3 N10

1.0903 mmu

230.2220C 12 H28 O N3

-1.2135 mmu

-79.9662-H O3 P

-0.0973 mmu

230.2220

313.2040C 16 H30 O2 N2 P

-0.4685 mmu

313.2040

185.1644 86.0965

399.2955C 16 H35 O2 N10

1.0397 mmu

-97.9768-H 3 O4 P

-0.0794 mmu

(b) MS/MS of doubly acetylated phosphopolyamine (m/z 497.2735)

Figure 3.8 Phosphopolyamine MS/MS spectra. H3PO4 (−98 Da) or HPO3 (−80 Da) neutrallosses.

3.2 profiling lysine ptms in biosilica extracts 61

ever, does not allow assigning unambiguously the C P bond in the phosphopolyamine

structure. The same problems were subsequently encountered in analysis of less abun-

dant phosphorylated species (PTMs 399 and 427), which were detected in biosilica

hydrolysates (displayed in Fig. 3.10e, phosphorylated).

2 1 - 0 - 1 - 2 - 3 - 4

0.0

0

- 1 - 2

d(31P)

-1.6

7

-1.7

0

ppm

(a) 31P-NMR spectrum of T. pseudonana biosilica hydrolysate.

_

(b) C O P; −0.6 ppm

_

(c) C P ; 8.5 ppm

Figure 3.9 31P-NMR spectrum of T. pseudonana biosilica hydrolysate.

To further elucidate the structure of the phosphopolyamines, diatom biosilica hy-

drolysates were subjected to 31P-NMR analysis, which were carried out by Marcus

Rauche in Eike Brunner laboratory (Technische Universität Dresden, Germany). The

NMR spectrum of T. pseudonana biosilica hydrolysate (displayed in Fig. 3.9a) revealed

the presence of one signal at −1.68 ppm. Phosphoserine with C O P bond and

N-phosphonomethylglycine (or glyphosate) with C P bond were measured indepen-

dently as a reference compound to distinguish chemical shifts for both bonds (data

not shown). The 31P-NMR spectra exhibit that for C P bonds the chemical shift is

about 8.5 ppm (Fig. 3.9c) and for C O P bond about −0.6 ppm (see Fig. 3.9b). The

signal at −1.68 ppm at the 31P-NMR spectrum of the hydrolysate supports the assump-

tion, that the phosphorous in all phosphopolyamines is attached via oxygen at the

lysine residue and not directly bonded on a carbon Fig. 3.9b. From comparison of the31P1H-NMR-decoupled and nondecoupled spectra the signal at −1.68 ppm exhibit

62 results and discussion

a doublet structure due to J-coupling (approximately 7.3 Hz) to one neighboring 1H

nucleus. Coupling constants around 7 Hz are typical for 3J-couples of P H. This is

indicating that the phosphate residue is linked to a disubstituted C H group. Instead

the coupling constants of P H bond in glyphosate is much higher with approximately

12.71 Hz and typical for 2J(PH)-couplings [162]. Consequently, both chemical shift and

multiplet structure indicate that the structure of phospho-containing compounds in

T. pseudonana biosilica hydrolysate corresponded to phosphoester bond (C O P, see

Fig. 3.9b).

Taken together, these results suggested that phosphopolyamine modifications com-

prise an abundant class of lysine post-translational modifications (see Fig. 3.10e and

Fig. 3.11e). These modifications was not described before in the literature, however

phosphorylation of the hydroxyl group of N-trimethylhydroxylysine has been previ-

ously reported by Nakajima and Volcani in Navicula pelliculosa diatom [52], which is

displayed in Fig. 1.5). Later, same lysine modification was found in silaffin-1A from

C. fusiformis [66]. The presence of these modifications in biosilica extracts from differ-

ent diatom species indicates, that phosphopolyamines may play an important role in

biosilicification process. Phosphopolyamines occurred in extracts of both T. pseudonana

and C. cryptica (however, with a different abundances, see Fig. 3.10e), but were com-

pletely absent from T. oceanica extract. This observation motivated the comparative

study of lysine PTMs in three diatom species, in order to analyze the similarities and

differences in ε-polyamine profiles from the three diatom species.

3.2.3 Lysine polyamine modification profiles of AFSM extracts

The AFSM biosilica hydrolysates were derivatized by AQC and subjected to LC-MS/MS

analysis, where QAC-derivatives were detected and quantified by XICs of their proto-

nated molecular ions (exact experimental procedure described in Section 5.6). The

molar amounts of ε-polyamines were calculated proportionally to the amount of in-

ternal standard spiked into each sample (PTM 275-orn), and then normalized to the

total molar amount of all ε-polyamine derivatives. For technical reproducibility evalua-

tion, three successive LC runs were considered, and for biological reproducibility, two

biological replicates were averaged. Coefficients of variation for AQC-derivatized in-

3.2 profiling lysine ptms in biosilica extracts 63

ternal standard, spiked into the biosilica extracts samples before analysis, were within

3–14 %. Coefficients of variation for technical replicates were within 10 % for all of the

measurements. Moderate reproducibility could be explained by the biological variation

in diatom cultures (age variations, different growth rate, etc.)

The content and abundance of modified lysines are shown in Fig. 3.10. To evaluate

the total occupancy of lysine residues in the each AFSM extract, the rate of unmodi-

fied lysine residues PTM 147 (2×QAC) out of total detected lysines was measured and

shown in Fig. 3.10a (unmodifified lysine out of total amount). Notably, total lysine

occupancy accounted for 75–85 %, which is consistent with the previous analysis of

purified tpSil3 protein (~75 %, see Section 3.1.2). Altogether, 25 modified lysine QAC-

derivatives were detected within 3 ppm accuracy, and their chemical structures were

confirmed by high-resolution MS/MS. Upon dissociation of QAC moieties the reporter

fragment of m/z 171.0564 is readily generated, while the rest fragments correspond to

fragmentation of underivatized lysine modifications. Interpretation of these spectra in

combination with the number of reacted derivatization moieties (N×QAC) aided the

assignment of N-methylation in the ε-polyamine side-chain, thus allowing to resolve

ambiguities in case of structural isomers. Independent proof for the assigned poly-

amine structures was obtained from high resolution MS/MS spectra of underivatized

molecules, which was discussed in Section 3.2.1. All the obtained quantitative and

structural data are schematically summarized in Fig. 3.10, where the corresponding

chemical structures are provided next to the data bars.

Mono-, di- and trimethylation of lysine ε-amino group represented the most abun-

dant cumulative modification, which accounted for 50–70 % of the total PTM abun-

dance (m/z 161.1285, 175.1441, 189.1598; ε-methylated, Fig. 3.10b). Linear polyamine

chains attached to ε-amino groups of lysine residues represent the most structurally

diverse subgroup of lysine PTMs (total 13 structures out of 25), whose polyamine side-

chains displayed different degree of N-methylation (ε-polyaminated, Fig. 3.10c). Six

ε-polyaminated and ε-methylated lysines were δ-hydroxylated (Fig. 3.10d). Addition-

ally, the LC-MS/MS analysis revealed three phosphorylated hydroxylysine derivatives

(PTMs 399, 413, and 427, Fig. 3.10e), whose non-phosphorylated counterparts with the

corresponding mass difference of 80 Da were also observed (accordingly, PTMs 319, 333

and 347; Fig. 3.10d). δ-hydroxylysine (PTM 163 2×QAC), and also ε-N,N,N-trimethyl-δ-

hydroxylysine (PTM 205 1×QAC), reported previously in other diatom species [52]. A

64 results and discussion

0%

5%

10

%

15

%

20

%

25

%

30

%

35

%

40

%

45

%

50

%

161 (2×QAC)

175 (1×QAC)

189 (1×QAC)

204 (3×QAC)

218 (3×QAC)

232 (2×QAC)

246 (2×QAC)

261 (4×QAC)

275 (4×QAC)

289 (3×QAC)

303 (2×QAC)

303 (3×QAC)

317 (1×QAC)

317 (2×QAC)

331 (1×QAC)

331 (2×QAC)

163 (2×QAC)

205 (1×QAC)

248 (2×QAC)

319 (3×QAC)

333 (2×QAC)

347 (2×QAC)

399 (3×QAC)

413 (2×QAC)

427 (2×QAC)

mol. %

(e) ph

osph

orylated(d) δ-h

ydroxy-p

olyamin

es(c) ε-p

olyamin

es(b) ε-m

ethyla ted

+

+

+

+

+

+

+

+ +

+

+ +

+

+

+ +

+

+ +

+ + +

+ +

+ +

+ + +

+

+ +

+ +

+

+

+

+

+ +

+ +

+ +

+ +

+ +

+ +

+ +

+

147 (2×QAC)

19

.4±

5.1

%

31

.5±

5.2

%

29

.65

±6

.6% 0%

20

%4

0%

60

%8

0%

10

0%

(a) u

nm

odifi

ed lysin

e (out of total am

oun

t)

mo

l. %

+

+

T. o

cea

nica

C. cry

p

ca

T. p

seu

do

na

na

nominal m/z of PTM

(N×QAC-groups)

Figure 3.10Structure

andcontent

oflysine

post-translationalmodifications

(PTMs)

inhydrolysates

ofdiatom

biosilicaA

FSMextracts

fromTP

,CC

, andTO

. Errorbars

fortw

obiologicalreplicates.C

hemicalstructures

ofdetected

lysinem

odificationsw

iththe

respectivenum

berofQ

AC

-groups,where

polyamine

molecules

areannotated

with

nominalm

/zvalues

ofthesingly

protonatedm

olecularion.See

alsoTable

3.2for

details.

3.2 profiling lysine ptms in biosilica extracts 65

number of ε-polyamine-modified lysines were previously reported for silaffin proteins,

e. g. PTMs 275, 289, 303, 333, which were also detected in tpSil3 hydrolysate (see [68]

and Section 3.1.2). In addition to these seven already known lysine PTMs, 18 novel

lysine modifications are reported here for the first time.

The length of polyamine chain in lysine modifications is restricted by two repeated

propylamine units for all three diatom species, while the methylation degree of poly-

amine chains may substantially vary. For instance, it is shown that the modification

of PTM 303 is present in all extracts as two structural isomers (see Fig. 3.7 in Sec-

tion 3.2.1). These two isoforms, which are denoted as PTM 303a and PTM 303b in

Table 3.2, were well separated via LC-MS/MS, because they carry different number of

QAC-derivatization groups (2×QAC and 3×QAC, respectively). The different number

of derivatization groups indicates on varying N-methylation pattern of their polyamine

chains. Abundances of both isoforms differed significantly throughout profiles of all

three diatom species. The same was observed for PTM 331 (1×QAC and 2×QAC),

where these modifications were specific for T. oceanica and C. cryptica respectively, sim-

ilar to the relative abundance of both isoforms varied among the three diatom species

and the number of reacted QAC-groups helped to resolve structural isomers.

3.2.4 Comparison of AFIM and AFSM profiles in T. pseudonana

After mineral phase dissolution and extraction of AFSM, the organic matrices that

remain insoluble after ammonium fluoride treatment was also isolated and analyzed

in T. pseudonana biosilica. Previously it has been shown, that ammonium fluoride in-

soluble material (AFIM) contains proteins, polysaccharides and long-chain polyamines

(LCPAs) [71, 163–165], however, it remained poorly characterized in any diatom species

due to laborious and inefficient isolation procedure. In the current work the biochemi-

cal composition and importantly ε-polyamine profile of the AFIM has been addressed

in T. pseudonana. The lysine ε-polyamine profiles for AFSM and AFIM are compared

in Fig. 3.11. The qualitative composition of lysine PTMs for both fractions remains the

same; at the same time, AFSM and AFIM display striking quantitative differences in

modification profiles. On one hand, the abundance of lysine ε-methylated species is

decreased in the AFIM extract (Fig. 3.11b), whereas on the other hand, the content of

66 results and discussion

δ-hydroxylated and particularly phosphorylated species is substantially increased for

AFIM fraction (Fig. 3.11d-e).

(b) ε-methylated (c) ε-polyaminated (e) phosphorylated(d) δ-hydroxylated

0%

10%

20%

30%

40%

50%

60%

70%

16

1 (

QA

C)

17

5 (

QA

C)

18

9 (

QA

C)

20

4 (

QA

C)

26

1 (

QA

C)

27

5 (

QA

C)

28

9 (

QA

C)

30

3 (

QA

C)

30

3 (

QA

C)

31

7 (

QA

C)

16

3 (

QA

C)

20

5 (

QA

C)

31

9 (

QA

C)

33

3 (

QA

C)

39

9 (

QA

C)

41

3 (

QA

C)

+

AFSM

AFIM

T. pseudonana

(a) unmodified lysine (out of total amount)

14.9%±0.6

29.6%±6.6

0%20%40%60%80%100%

mol. %

14

7 (

QA

C)

+

+

+

+

+

+

+

++

++

++

+

++

++

++

+

++

+

+

++

++

++

++

+

+

mo

l. %

Figure 3.11 Comparison of AFSM and AFIM

The abundance of unmodified lysine (out of total amount) in AFIM is two times less

as compared to AFSM Fig. 3.11a, and the profile is shifted towards ε-polyamine species

in AFIM. Important to mention, that proteins comprising insoluble fraction are differ-

ent from those in AFSM, consisting mainly of cingulins and newly discovered proteins

termed silicanins [90, 166]. The larger number of lysine PTMs in these proteins would

be expected to enhance the silica formation activities of protein aggregates in AFIM,

which might be important for biosilica morphogenesis of girdle bands region. How-

ever, PTM profiling of AFIM fractions in other diatom species remains to be further

investigated.

3.2 profiling lysine ptms in biosilica extracts 67

3.2.5 Phylogenetic relationship across three diatom species

The modification pool shared by all diatom species is qualitatively conserved for most

of the modifications, although the relative abundances of lysine PTMs differed strongly

throughout profiles of all species. Some of modifications are shared by all studied

species, whereas several PTMs appear to be species-specific. To further investigate this

issue, we compared the full profiles of lysine modifications in biosilica extracts from

three diatom species T. pseudonana, C. cryptica and T. oceanica. The graphic presentation

of phylogenetic relationship can be significantly simplified via tabular view in Table 3.3,

where the lysine modifications have been clustered into the boxes ( ) with species-

specific modifications based on their abundance or occurrence.

It is shown in Table 3.3, that T. pseudonana and C. cryptica ( ) shared almost

the same polyamine modification pool with three specific PTMs for C. cryptica ( ),

while multiple exceptions occurred for the phylogenetically more distant diatom T. oceanica

( ). Total six lysine modifications were specific for T. oceanica extract (i. e., PTMs 218,

232, 246, 248, 317a and 331a). It is clear, therefore, that polyamine PTM profiles follow

the phylogenetic proximity across three diatom species, which is reflected by phyloge-

netic tree on top of each column in Table 3.3 ( , adapted from Fig. 1.9a and [156,

157]).

Strong differences in almost all PTM abundances imply the existence of tightly regu-

lated enzymatic machinery responsible for biosilica formation in diatoms. Differently

modified lysines may represent semi-products of subsequent silaffin processing steps,

where enzymatic machinery for post-translational modification should include methyl-

transferases, aminopropyl-transferases, and hydroxylases (see Fig. 3.10b–d). There is

a clear structural similarity between lysine ε-polyamine chains and LCPAs, indicating

a common enzymatic pathway for biosynthesis of these molecules (which has been

previously hypothesized in [99, 167]). These scheme implies that hydroxylation and

phosphorylation of lysine residues occurs after ε-polyamination, which is supported

by the presence of the corresponding ‘intermediates’. Based on the clustered data,

summarized in table Table 3.3, it is possible to draw hypothetical routes for lysine

post-translational modifications that are demonstrated in Fig. 3.12 (the direction of

hypothetical PTM pathway is marked with arrows).

Nevertheless, the surprising variability of polyamine modifications and strong dif-

ferences for abundances of (almost) all lysine derivatives imply on the set of differently

modified proteins potentially involved in biosilica morphogenesis. Therefore, the fur-

ther elucidation of post-translational specificity for protein modifications is important

for mechanistic understanding of biosilicification processes in different diatoms species.

To follow up on this notion, it is necessary to locate the sites of those modifications in

silaffin sequences and to perform inter- and cross-species comparison of the corre-

sponding PTM patterns that will be presented in Section 3.3.

(c) TO specific

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+ +

+ +

+ +

+

+

+

+

PTM 161

PTM 175

PTM 189

PTM 204PTM 218

PTM 232

PTM 246

PTM 261

PTM 275

PTM 289

PTM 303 (2×QAC)

PTM 303 (3×QAC)

PTM 317 (1×QAC)

PTM 317 (2×QAC)

PTM 331 (1×QAC)

PTM 331 (2×QAC)

PTM 163

PTM 205

PTM 248

PTM 319PTM 333

PTM 347

PTM 399 PTM 413 PTM 427

Lysine

+

+

+

(a) All species

(d) CC specific (b) TP and CC specific

?

? ?

?

+ 57 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 2

8 D

a

+ 16 Da

+ 80 Da+ 80 Da

+ 16 Da

+ 16 Da

+ 16 Da

+ 80 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 16 Da

+ 14 Da

+ 14 Da

+ 14 Da

+ 16 Da

+ 14 Da

+ 14 Da

+ 57 Da

Figure 3.12 Hypothetical routes for lysine post-translational modifications from the three diatom species,based on their occurrence and abundance. See also Table 3.3. TP , T. pseudonana; CC , C. cryptica; TO ,T. oceanica

Table 3.3 Tabular representation of the data from Fig. 3.10 (clustered by occurrence and abundance). Phylogenetic relationship between the threediatom species is shown by the tree ( ), which is adapted from Alverson et al. [156]. The abundance profiles of modified lysines were similar inthe phylogenetically more closely related T. pseudonana and C. cryptica ( ), and both differed from the profile in the phylogenetically more distantT. oceanica ( ). TP , T. pseudonana; CC , C. cryptica; TO , T. oceanica.

Occurred in TP , CC and TO Occurred in TP and CC Occurred in TO only Occurred in CC only

+

+ +

+ +

+ + + +

PTM 163 (2×QAC) PTM 205 (1×QAC) PTM 218 (3×QAC) PTM 331b (2×QAC)

0.7±0.3% 0.2±0.0% 4.6±0.5% 2.2±0.5% 1.6±0.7% 0.0±0.0% 0.0±0.0% 0.0±0.0% 0.8±0.5% 0.0±0.0% 3.3±0.9% 0.1±0.0%

+

+ + + + +

+

+ +

PTM 161 (2×QAC) PTM 275 (4×QAC) PTM 232 (2×QAC) PTM 347 (2×QAC)

4.2±0.9% 9.7±1.6% 3.3±1.7% 7.5±0.6% 5.3±0.8% 0.5±0.3% 0.0±0.0% 0.1±0.1% 10.7±0.3% 0.0±0.0% 0.9±0.3% 0.0±0.0%

+

+ + + + + + + +

PTM 175 (1×QAC) PTM 303b (3×QAC) PTM 246 (2×QAC) PTM 427 (2×QAC)

30.7±4.2% 23.2±7.3% 44.1±0.6% 1.0±0.2% 3.1±0.7% 0.2±0.0% 0.0±0.0% 0.0±0.0% 1.5±0.3% 0.0±0.0% 1.7±0.4% 0.0±0.0%

+

+ + + +

PTM 189 (1×QAC) PTM 319 (3×QAC) PTM 248 (2×QAC)

2.5±0.4% 24.9±1.4% 17.4±2.6% 3.7±1.1% 2.6±0.3% 0.1±0.2% 0.0±0.0% 0.0±0.0% 1.1±0.2%

+

+ + + + +

PTM 204 (3×QAC) PTM 333 (2×QAC) PTM 317a (1×QAC)

0.8±0.2% 0.4±0.1% 1.0±0.1% 9.3±4.2% 1.7±0.1% 0.4±0.2% 0.1±0.0% 0.1±0.1% 2.5±2.0%

+

+ + + + +

PTM 261 (4×QAC) PTM 399 (3×QAC) PTM 331a (1×QAC)

1.5±0.7% 0.2±0.1% 2.3±0.3% 3.7±1.5% 4.0±0.4% 0.0±0.0% 0.0±0.0% 0.0±0.0% 1.3±0.2%

+ + + +

PTM 289 (3×QAC) PTM 413 (2×QAC)

16.4±4.5% 12.4±2.8% 2.8±0.9% 9.2±5.3% 3.1±0.4% 0.1±0.1%

+ +

PTM 303a (2×QAC)

2.2±0.1% 1.6±0.3% 6.2±1.4%

+ +

PTM 317b (2×QAC)

0.5±0.5% 0.4±0.1% 0.9±0.2%

72 results and discussion

3.3 site-specific localization and discovery of con-

sensus motifs for lysine polyamine ptms

In order to investigate site-specificity of post-translational modification machinery in

diatom biosilica, PTM profiles should be followed up by accurate mapping of modifi-

cation sites in biosilica-associated proteins from three diatom species. Localization of

lysine modifications from PTM profiles (displayed in Fig. 3.10) can be achieved through

the ‘bottom-up’ approach [109]. The presence of characteristic fragments (summarized

in Table 3.2) in MS/MS spectra of modified peptides will be beneficial for further

validation of the found PSMs. However, the canonical bottom-up proteomics faces im-

portant limitations, when it comes to analysis of highly post-translationally modified

proteins, such as silaffins. To address these challenges, the current study was focused

on the tailoring of the existing proteomics methodologies towards the localization of

highly heterogeneous lysine polyamine modifications in biosilica-associated proteins.

The applicability of this specialized approach need to be validated with an analysis

of previously characterized protein, silaffin-3 from T. pseudonana (tpSil3) [67, 68]. Next,

PTMs were mapped onto sequences of biosilica-associated proteins from three closely-

related diatom species (T. pseudonana, C. cryptica, T. oceanica). Finally, the found modifi-

cation sites should be aligned to reveal consensus motifs for post-translational modifi-

cation.

3.3.1 Multiple protease strategy for mapping lysine PTMs

As discussed in Section 3.1.3, tpSil3 is a well-characterized protein from T. pseudonana

biosilica extract [67, 68]. Previously, Sumper et al. attempted to map lysine modifica-

tions in tpSil3 protein using multiple proteolytic enzymes, chemical cleavage reagents

(CNBr), and their combinations [68]. However, the information about lysine PTM sites

obtained in that study is more suggestive than definitive. In this regard, the major lim-

itation was the use of low-resolution mass measurement for lysine modification map-

ping in the absence of proper MS/MS confirmation for sequences of detected modified

peptides. We therefore aimed to demonstrate that our PTM localization results are

3.3 ptm localization and discovery of consensus motifs 73

consistent with, or improved over previous tpSil3 mapping efforts. As shown above in

Fig. 3.4, the amino acid content of the purified protein was analyzed and verified to be

identical to the predicted. Additionally, the purity of the native tpSil3 was examined

by SDS-PAGE, which displayed the presence of a single intense band (see Fig. 3.15).

To investigate the modification sites in tpSil3, native protein was digested in-gel with

several proteolytic enzymes having complementary cleavage specificity (Asp-N, chy-

motrypsin, Proteinase K, Glu-C and trypsin), and the resultant digests were analyzed

by LC-MS/MS (refer to Section 5.14). To this end, using more than one proteolytic en-

zyme having complementary cleavage specificity, or a multiple protease strategy [168],

increases the sequence coverage and, therefore, chances of detecting PTM sites in

biosilica-associated proteins. The combination of both highly selective and nonselective

proteases improves protein and PTM coverage [169, 170]. Moreover, digesting silaffin

proteins with multiple proteases also improves the chances of producing informative

mass spectrum, facilitating the modification assignment problem and resolving poten-

tial PTM localization ambiguities. However, only one (modified) peptide was detected

in Asp-N digest, which resulted in drastically low sequence coverage (total ~5 %, refer

to Fig. 3.13a).

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(a) native tpSil3 (11/205 amino acids, 5 % coverage)

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(b) tpSil3 expressed from a synthetic gene (192/205 amino acids, 94 % coverage)

Figure 3.13 Peptide coverage obtained for: (a) native tpSil3 (natively purified protein, 5 %);(b) tpSil3 expressed from a synthetic chimeric gene (94 %, accounted for both Asp-N andtrypsin)

74 results and discussion

To demonstrate that unmodified tpSil3 sequence is perfectly digestible, a synthetic

chimeric gene, which encodes tpSil3 sequence without a signal peptide3 concatenated

with reference quantification peptides from protein standards (four from BSA and six

from PhospB) and flanked by purification tags (Twin-strep-tag and His-tag) was pro-

duced (the full sequence is displayed in Fig. 5.2). This synthetic chimeric gene was

inserted into high-level expression vector and expressed in E. coli (for cloning protocol

refer to Section 5.3 or to [171]). The resulting band of overexpressed protein was di-

gested in-gel with Asp-N and trypsin. LC-MS/MS analysis of the digest resulted in

94 % peptide coverage (Fig. 3.13b). We therefore concluded that tpSil3 bears a complex

and abundant set of PTMs, which impedes the access of proteolytic enzymes to protein

backbone, thus decreasing the sequence coverage. The analysis of these highly modi-

fied structures require the use of specialized protein deprotection technique, which can

enable access of proteases and allow mapping of lysine modifications, which should

remain unaffected by deprotection.

3.3.2 Selection of deprotection technique

The most logical approach to improve the digestion efficiency and to maximize the

number of detectable peptides per protein is to selectively remove non-lysine PTMs.

Several enzymes are available for releasing O-linked glycans and phosphorylation.

However, dephosphorylation with calf intestinal alkaline phosphatase (CIAP) [172]

was inefficient for tpSil3 deprotection (data not shown). As compared with enzymatic

removal of protein phosphorylation or glycosylation, the chemical deprotection has

the advantage that all O-linked modifications can be removed regardless of their struc-

ture. However, harsh chemical methods can also cleave peptide bonds, which leads to

unacceptable protein degradation. To this end, several chemical treatments were exam-

ined for reducing the modification complexity of tpSil3, while leaving the polypeptide

backbone intact:

(a) treatment with trifluoromethanesulfonic acid (TFMS) [84, 85];

(b) treatment with soluble HF·pyridine complex [173–175];

3 N-terminal signal peptide (17 amino acids) for cotranslational import into the endoplasmic reticu-lum [70] is cleaved out at RXL site and therefore was not considered for coverage evaluation.

3.3 ptm localization and discovery of consensus motifs 75

(c) treatment with anhydrous HF [58, 59].

The apparent molecular weight of native tpSil3 was around 60 kDa (0 h point in

Fig. 3.15a–3.15c), while the calculated mass of this protein is ~25 kDa (including the

mass of lysine modifications profiled in tpSil3, see Fig. 3.4 in Section 3.1.3). Therefore,

it was hypothesized that tpSil3 also bear multiple O-linked PTMs, which render it

highly negatively charged and decrease electrophoretic mobility of tpSil3. Moreover,

tpSil3 is poorly stained with colloidal Coomassie and for visualization the polycationic

carbocyanine dye ‘Stains All’ [87] was employed, which is also diagnostic of a high

negative net charge of the native protein. Altogether, the strikingly low electrophoretic

mobility and negative charge is a clear indication of covalent modifications affecting

multiple amino acid residues in tpSil3. Indeed, complex O-linked glycosylation and

sulphation has been previously reported for this protein (described in Section 1.3.2, see

also [67]).

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(a) TFMS-treated tpSil3 (93/205 amino acids, 45 % coverage)

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(b) tpSil3 treated with HF·pyridine complex (12/205 amino acids, 6 % coverage)

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(c) tpSil3 treated with anhydrous HF (168/205 amino acids, 82 % coverage)

Figure 3.14 Figure continued from p. 73. Peptide coverage obtained for: (a) treatment with tri-fluoromethanesulfonic acid (TFMS) (40 %); (b) tpSil3 treated with soluble HF·pyridine complex;(c) tpSil3 treated with anhydrous HF

76 results and discussion

(a) TFMS

0h ½h 2h0h ½h 2h

Stains allCoomassie

(b) HF·pyridine complex

0h ½h 1h 2h 3h0h ½h 1h 2h 3h

Coomassie Stains all

(c) anhydrous HF

0h 1h1h 0h

Stains allCoomassie

Figure 3.15 Gel images stained with Coomassie and ‘Stains all’, demonstrating different treat-ments of tpSil3 protein.

Removal of the O-linked modifications (carbohydrate or/and phospho groups) by

these three reagents was performed for tpSil3 prior to digestion in-gel with the same

set of proteases (see above). Resulting digests were analyzed with LC-MS/MS, and

obtained peptide coverage values was compared with the one for the native tpSil3 (5 %,

Fig. 3.13a). Sequence coverage achieved for each treatment is shown in Fig. 3.14a–3.14c.

Deglycosylation with TFMS allowed to cover 14 lysine residues (and 40 % of amino acid

sequence, Fig. 3.14a), whereas treatment with HF·pyridine complex turned out to be

completely inefficient for tpSil3 (5 %, Fig. 3.14b). In contrast, treatment with anhydrous

HF resulted in 73 % coverage that contained 25 out of a total of 33 lysine residues (see

Fig. 3.14c). Moreover, treatment by anhydrous HF also improved electrophoretic be-

havior of tpSil3 (~35 kDa), allowing visualization with MS-compatible Coomassie dye

(see in Fig. 3.15c). At the same time, TFMS-treatment reduced the apparent molecular

weight to 50 kDa (Fig. 3.15a). TFMS is known to cleave O-linked glycans, however the

O-phosphorylation is stable to this treatment [85], which was verified with β-casein

standard (data not shown). This explains the lower mass shift for TFMS, as compared

to anhydrous HF, which cleaves both O-phosphoester and O-glycosidic linkages, while

preserving peptide bonds. The latter was ensured by treatment of commercially avail-

able protein standards (BSA, β-casein and Ribonuclease B). Protein losses after anhy-

drous HF treatment did not exceed 40 % (data not shown). Alternatively, HF·pyridine

complex demonstrated much higher protein losses in comparison to other procedures

tested (almost full degradation after 1 hour of treatment, see Fig. 3.15b). Hence, anhy-

drous HF outperformed two alternative treatments both in terms of sequence coverage

3.3 ptm localization and discovery of consensus motifs 77

increase and lower protein degradation rates, and, despite the toxicity and requirement

of special equipment for handling, this procedure was selected for further studies.

3.3.3 Mapping lysine PTMs on tpSil3 using iterative search strategy

The use of HF-deprotection technique makes tpSil3 almost fully accessible to the pro-

teolytic enzymes. After parallel digestion in-gel with five proteases having comple-

mentary cleavage specificity (Asp-N, chymotrypsin, Proteinase K, Glu-C and trypsin),

the resultant peptides were subjected to LC-MS/MS analysis. The employed proteases

were carefully chosen according to their cleavage specificity (refer to Table 5.3), whereas

the use of each additional protease increased the protein sequence coverage by, on aver-

age, 15 %. Although less suitable for identification purposes, digestion by several pro-

teases allows producing complementary longer overlapping peptides, thus improving

the identification of PTM sites [168]. The larger size of peptides is beneficial in this case,

because it compensates the peptide hydrophilicity acquired from highly charged ly-

sine PTMs. The analysis of larger peptides gears this multi-protease approach towards

‘middle-down’ proteomics, such that confident combinatorial assignment of variable

modification sites becomes possible.

As discussed in Section 3.2, total 25 lysine PTMs were detected in biosilica extracts

from the three diatom species. In order to search the acquired MS/MS spectra against

all these PTMs, they need to be user-defined prior to database searches. Fixed mod-

ifications do not increase the complexity of the search, while the number of variable

modifications must be limited, in order to confine the search space and to control the

rate of false positive identifications, or false discovery rate (FDR, [153]). To overcome

the drawbacks of database search against multiple PTMs simultaneously, a number of

strategies for unrestricted PTM identification may be employed, such as the de novo

sequencing [147, 176], sequence-tag [177], and second pass searches [178]. Each of

these strategies has its own limitations and weaknesses, including sensitivity towards

the database size and the quality of MS/MS spectra [144]. It is therefore advisable

to limit the number of allowed PTMs for each query using follow-up searches [179].

This approach expands the systematic localization to hundreds of PTMs from complex

MS/MS data.

78 results and discussion

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(a) Lysine PTM map of tpSil3 obtained in the current study

(b)+28

K (PTM 175) (c)+142

K (PTM 289)

+

(d)+186

K (PTM 333)

(e) MS/MS spectrum of modified peptide A.E+28

K QAIEAAVEE.D (m/z 622.82; 2+)

(f ) MS/MS spectrum of modified peptide E.DVTEMSMA+142

K AGK.D (m/z 713.37; 2+)

(g) MS/MS spectrum of modified peptide Q.AIEAAVEEDVAGPA+186

K AA.K (m/z 600.00; 3+)

Figure 3.16 Silaffin mapping. (e); (f); (g).

3.3 ptm localization and discovery of consensus motifs 79

This approach implies that multiple database queries are searched repeatedly us-

ing conventional search engine (such as Mascot) with a restricted number of variable

modifications. It has been empirically found that peptides containing more than two

ε-polyamine modifications do not produce confidently interpretable MS/MS spectra,

and, therefore, larger number (> 2) of variable modifications per search could be

omitted. Digestion with non-specific or semi-specific proteases (e. g., Proteinase K or

chymotrypsin; see Table 5.3) often results in unusual and unexpected protein cleav-

ages [180]. Therefore, additional searches without cleavage specificity have been per-

formed to ensure all peptide-spectrum matches (PSMs) that were missed by a specific

database search (for experimental details refer to Section 5.15). The performance of

this approach was evaluated with mapping of pre-defined modifications onto tpSil3

sequence: PTMs 175, 261, 275, 289, 303a, 319 and 333 (see Fig. 3.44). This number

of variable modifications (total six) resulted in 15 consecutive searches5, each with a

combination of two different variable ε-lysine modifications. At the same time each

search included methionine oxidation as a variable modification and carboxyamido-

methylated cysteine as a fixed one6. The overall workflow of the multiple non-specific

searches with restricted number of variable modifications per query resulted in mod-

erate search times (less than an hour per one query) and false discovery rates (FDRs)

below 2 %, which was the acceptance criterion for the further studies.

The sequence of tpSil3 with mapped lysine PTMs is displayed in Fig. 3.16a. It

was possible to achieve 73 % of cumulative sequence coverage, whereas all mapped

modifications were confirmed with peptide MS/MS spectra (in contrast to a previ-

ous study [68], discussed further in Section 3.3.6). In total, 13 PTM sites have been

mapped with three different kinds of lysine modifications (PTMs 175, 289, and 333, see

Fig. 3.16b–3.16d), while 11 lysines were found unmodified. These results are consistent

with the PTM profile of tpSil3 (see Fig. 3.4), where 30 % of unmodified lysines has been

detected (while 11 out of 33 were mapped unmodified, ~33 %). However, some of these

4 After HF-treatment [58], phosphorylated PTMs 399 and 413 were converted into PTMs 319 and 333respectively (Fig. A.6). Therefore, PTMs 399 and 413 were not searched in MS/MS data obtained fromHF-treated samples.

5 If the set has n elements, the number of k-combinations, where the order of selection does not matter, isdefined as

nCk =

n!k!(n−k)!

6 The sulfur-containing amino acids methionine and cysteine are more easily oxidized than the otheramino acids. During the sample preparation cysteines are carboxyamidomethylated and the reactionis close to 100 % and therefore this modification should be specified as fixed. However, in all casesmethionine oxidation has to be specified as a variable modification.

80 results and discussion

lysines could have multiple modifications, and therefore direct comparison of these

data may be biased. Unfortunately, full coverage is rarely achieved, and in the case of

tpSil3 it was not possible to map 9 remaining lysines (marked with ? in Fig. 3.16a),

presumably due to lack of informative fragment spectra of peptides cleaved from un-

covered regions. Moreover, lysine ε-polyamination and methylation render modified

peptides hydrophilic, thus reducing separation efficiency on RPLC beads.

Some of these fragment spectra are provided in Fig. 3.16e–3.16g, where each peptide

represent either type of lysine PTM. During this study it was observed, that MS/MS

spectra of multiply-charged peptides (> 3+) often contain multiply charged fragments

(> 2+), which are ignored by the conventional database search engines such as Mas-

cot. To address this issue, deconvolution of raw peptide mass spectra was performed,

which will be discussed further in Section 3.3.4.

3.3.4 Identification of modified peptides by deconvolution of raw MS/MS spectra

We were focused on two types of diatom extracts, rich in biosilica-associated proteins

including silaffins: AFIM and AFSM. These extracts were separated by SDS-PAGE and

visualized with the polycationic carbocyanine dye ‘Stains All’ [86, 87]. From the gel

images it could be concluded, that T. pseudonana and C. cryptica AFSM extracts comprise

heterogeneous set of proteins, however the T. oceanica extract consists of few highly

abundant components. The entire gel slabs were excised and digested in-gel with the

same set of proteases as mentioned above, and the resulting digests were analyzed by

LC-MS/MS.

Although classical bottom-up analysis is the optimal strategy for PTM mapping,

combining high sensitivity of detection and efficient MS/MS fragmentation of short

modified peptides, the mapping of multiple lysine modifications in silaffins challeng-

ing. Bottom-up proteomics exclusively relies on high cleavage specificity of trypsin,

which cuts peptide bonds at the C-terminus of arginine and lysine residues. Such a

cleavage places the highly basic residues at the C-termini and generates peptides in

the preferred mass range (from 0.5 to 3 kDa) for effective MS/MS fragmentation [181].

Doubly-protonated peptides undergo facile fragmentation yielding sequence informa-

tion [141]. In this regard, collision-induced techniques, collision-induced dissocia-

3.3 ptm localization and discovery of consensus motifs 81

(a) Non-deconvoluted MS/MS; S.DASTEYESGASEAGAEVTA+142

K AE+28

K GSD.D; ion score = 56.9

(b) Deconvoluted MS/MS; S.DASTEYESGASEAGAEVTA+142

K AE+28

K GSD.D; ion score = 102.2

Figure 3.17 HCD MS/MS spectra of modified peptides from protein silaffin-4 fromT. pseudonana (tpSil4) (identified in Asp-N digest of T. pseudonana biosilica extract). (a) raw spec-trum (m/z 910.77; 3+); (b) deconvoluted spectrum (m/z 910.77; 3+); (c) deconvoluted spectrumof peptide (m/z 662.99; 3+).

tion (CID) and higher-energy collisional dissociation (HCD), are most effective for

short, low-charged unmodified peptides such that highly informative and more eas-

ily interpretable mass spectra are produced.

As mentioned above, lysine modifications completely block tryptic digestion, and,

due to the significant lysine PTM sites occupancy (70 % to 80 % of lysines are modified,

see Fig. 3.10a), makes trypsin completely inefficient for silaffin PTM mapping. Protein

digestion with alternative proteases (e. g., Asp-N or Glu-C) leads to long and highly

charged peptides, whose MS/MS-spectra are particularly difficult to interpret [182].

Additionally, the presence of multiple basic lysine residues and positively-charged

polyamine modifications prevents full fragmentation upon CID or HCD and directs

the backbone bond dissociation to specific sites, which inhibits the formation of a suf-

ficiently diverse series of b and y-type fragment ions. Moreover, the presence of mul-

tiple positively-charged lysine ε-polyamine modifications, at positions other than the

C-terminus, result in complex fragmentation spectra due to the presence of multiply-

82 results and discussion

(c) Deconvoluted MS/MS; S.DMSVSS+186

K AQMSYIHGSG.D; Mascot ion score = 38.8

Figure 3.17 HCD MS/MS spectrum of modified peptide. (c) deconvoluted spectrum of pep-tide (m/z 662.99; 3+) (continued from previous page).

charged fragment ions, that fail to be matched with the conventional database search

engines like Mascot. To overcome this issue, pre-processing of the raw MS/MS spectra

was performed, that took an advantage of high resolution and mass accuracy of Or-

bitrap. This mass analyzer can resolve fragments with very close ∆-masses as well as

multiply charged ions at the level of isotope distribution. Consequently, pre-processing

reduces MS/MS spectra complexity in two steps: the first one reduces the isotope en-

velope of the fragment to one peak (deisotoping) [151], whereas the second mathemati-

cally collates a spectrum of several peaks for multiply-charged fragments into one peak

corresponding to a singly-charged ion (deconvolution) [148]. This implies that each frag-

ment is represented in the spectrum by only one singly charged peak, thus facilitating

the peptide identification from the resulting MS/MS peptide spectra.

An algorithm for deconvolution of mass spectra to singly charged fragment spectra

was implemented according to Gorshkov et al. [183]; processing details are described

in the Materials and Methods (Section 5.15). The total pre-processing procedure

for the entire spectral dataset takes less than a minute, which is significantly less than

the database search times. The typical example of deconvolution pre-processing for a

triply-charged peptide bearing two different lysine modifications is given in Fig. 3.17.

Non-deconvoluted spectrum contains multiply-charged fragments Fig. 3.17a, which

are either ignored by Mascot database search or assigned with lower ion score (56.9),

whereas the deconvoluted spectrum demonstrates extended y-ion series with an almost

doubled ion score (102.2, Fig. 3.17b) and improves spectrum-to-sequence matching.

Deconvolution helps to overcome this issue, allowing to match peptide spectra that

3.3 ptm localization and discovery of consensus motifs 83

has not been matched before deconvolution (example of such MS/MS is provided in

Fig. 3.17c).

Altogether, a total of 61 silaffin-like proteins were identified in the three diatom

species (25 for T. pseudonana, 15 for C. cryptica, 20 for T. oceanica), while 26 of them

were post-translationally modified with 5 types of lysine PTMs. It was possible to

localize in total 130 lysine PTM sites. The current analysis of the AFIM extract from

T. pseudonana revealed several novel biosilica-associated proteins with unknown func-

tions in [90, 166]. All identified proteins are summarized in Table A.2. In addition

to already known proteins, several novel silaffin-like proteins (SFLPs) were found in

C. cryptica and T. oceanica extracts. However, in some cases it was not possible to identify

PTM site unambiguously due to the lack of complete fragment ion series. To address

this issue, the characteristic fragments obtained from fragmentation spectra were used,

which is discussed in the following section.

3.3.5 Mass spectrometric mapping of PTM sites based on characteristic fragments

A multiple protease strategy, which was applied for mapping of PTMs in tpSil3 protein

(and discussed in Section 3.3.1), often produces peptides with sub-optimal length [168].

Specifically, peptides cleaved by other proteases than trypsin are likely to contain one

or more internal lysine residues that bear positively charged modifications, a situation

that can lead to unassignable MS/MS. PTM assignment in such peptides is therefore

far more challenging, since it is not always possible to unambiguously map a modifica-

tion site to specific lysine residue. Moreover, another problem arises from the regular

structure of ε-polyamine modifications, where different combinations of these PTMs

can correspond to the same mass shift (e. g., the mass of+142

K ++28

K will be the same as

for+85

K ++85

K ).

As demonstrated in Section 3.2.1 and Table 3.2, fragmentation of covalently mod-

ified lysines produces a set of characteristic ions that are specific for each type of

ε-polyamine modification. The presence or absence of these reporter fragments can be

used for PTM determination in instances, where the lack of b or y-ions leads to insuffi-

cient evidence for site-specific PTM assignment. All of the polyamine-specific reporter

fragments refer to low m/z in the range of 50 and 300 Da. In this scenario, polyamine-

84 results and discussion

modified peptides can be matched to bona fide MS/MS spectra containing a peptide

sequence tag (PST), or series of sequence ions that are clearly identifiable. Notably,

normalized collision energies (nCE) that are usually applied for peptide fragmentation

are in the same range as for fragmentation of modified lysines (about 25 % to 35 %),

whereas Orbitrap resolution increases as 1/√m/z towards low masses.

143.1543

(a) MS/MS of peptide G.DMSMA+142

K SH+28

K AEAE.D; (m/z 535.61; 3+); nCE to 30 %; ion score = 51.7

143.1543

(b) MS/MS of peptide V.DA+142

K AS+28

K ESHMSISG.D; (m/z 539.96; 3+); nCE to 30 %; ion score = 44.3

143.1543

(c) Characteristic ion (m/z 143.1543; 1+) from PTM 289 (+142

K )

Figure 3.18 MS/MS spectra of modified peptides that contain characteristic fragments forε-polyamine of PTM 289. Spectra were deconvoluted prior to Mascot ion search

Characteristic fragments that are present in peptide spectra are displayed in Fig. 3.18.

Fragmentation of ε-polyamine side-chain of modified lysine residue produces charac-

teristic ion m/z 143.1543 (Fig. 3.18a–3.18b). The presence of this fragment indicates

that fragmented peptides bear PTM 289, which is depicted in Fig. 3.18c. This ion was

subsequently used as reporter ions for peptides modified with PTM 289.

3.3 ptm localization and discovery of consensus motifs 85

Characteristic fragment m/z 143.1543 can also occur upon fragmentation of ε-polyamines

with a similar structure, e. g., PTM 333. MS/MS spectra of two peptides, modified ei-

ther by PTM 289 or 333 are provided in Fig. 3.19. However, the intensity of m/z 143.1543

fragment varies significantly in these spectra (cf. Fig. 3.19a–3.19b). Upon fragmenta-

tion of the peptide bearing PTM 333 the intense fragment m/z 143.1543 is released,

which results from cleavage at the quaternary ammonium group (Fig. 3.19d). The

facile fragmentation of PTM 333 was explained previously in the literature by inter-

nal proton transfer from the adjacent secondary amino group [102]. This fragment

with m/z 143.1543, can be used as a distinguishing feature between PTMs 289 and 333.

Next, the sequence context of mapped PTM sites were compared in order to identify

consensus modification sequences.

3.3.6 Identification of consensus motifs harboring lysine PTMs

Modified peptides identified by MS/MS ion searches (see Section 3.3.4) were matched

by Mascot searches against a database that contained proteins from the three diatom

species and common laboratory contaminants (refer to Section 5.15 for experimental de-

tails). All the diatom proteins that are not associated with biosilica (histones, clathrins,

etc.) were subsequently filtered out. The resulting list of amino acid sequences with

mapped modification sites (provided in Table A.2) was further checked using BLAST.

It appeared that these proteins show little homology to each other or other proteins in

the NCBI non-redundant database with a clear GO assignment.

Most polyamine-modified proteins contain KXXK repeats and RXL processing sites

(Table A.2, highlighted in blue), which are also present in all silaffins characterized to

date (refer to Sections 1.3.1 and 1.3.2). These sequence features are clearly conserved

among all proteins identified in this work, such as silaffins (e. g., tpSil3 (B8BRK6) and

tpSil4 (B8C0W5) from T. pseudonana), cingulins (B8CGS1, CingulinY3 from T. pseudonana )

and novel silaffin-like proteins (K0S9A6 and K0SSD7 from T. oceanica; G11469 and G22685

from C. cryptica). A significant number (115 out of 150) of identified lysine ε-polyamine

PTM sites reside within KXXK repeats. It was hypothesized previously that KXXK repeti-

tive motifs may represent a target for polyamine modification [68]. In addition they are

known to mediate silica precipitation [184] and to be involved in intracellular target-

86 results and discussion

143.1543

low intensity

(a) MS/MS of peptide K.AE+42

K PASSMPEMSVGA+142

K .A; (m/z 535.61; 3+); nCE to 30 %; ion score = 49.3

143.1543

high intensity

(b) MS/MS of peptide K.AE+42

K PASSMPEMSVGA+186

K .A; (m/z 539.96; 3+); nCE to 30 %; ion score = 43.5

143.1543

(c) Characteristic ion (m/z 143.1543; 1+) from

PTM 289 (+142

K )

+

143.1543

(d) Characteristic ion (m/z 143.1543; 1+) from

PTM 333 (+186

K )

Figure 3.19 MS/MS spectra of modified peptides, that contain characteristic fragments forε-polyamine of PTM 289. Spectra were deconvoluted prior to Mascot ion search

3.3 ptm localization and discovery of consensus motifs 87

ing [71, 185]. However, the KXXK repeat also frequently occurs in non-modified proteins.

Therefore, in order to reveal any consensus sequences for polyamine modifications, the

immediate vicinity of mapped PTM sites needs to be directly compared.

PTMs are usually located in the context of a particular amino acid pattern with a

fixed length (sequence motif). We expect a true ε-polyamination motif to be shared

between distinct biosilica-associated proteins, particularly those from different diatom

phyla. Moreover, sometimes one PTM can promote or inhibit the modification of ad-

jacent amino acids, what is generally termed ‘PTM crosstalk’. PTMs involved in a

crosstalk typically occur in close proximity to each other [186–195]. Therefore, the

current study is based on two hypotheses: (i) lysines bearing the same PTM type

in non-homologous biosilica-associated proteins should be located in a conserved se-

quence context; (ii) if PTM crosstalk exists, two adjacent PTM sites should occur more

frequently than would be expected by chance alone.

As discussed in Section 1.3.3, Sumper et al. formulated a set of empirical rules based

on mapping of tpSil3 lysine PTMs, which are referred to as the ‘lysine modification

code’ [68]. According to these rules, the N-terminal lysine of K(A/S/Q)XK motif is modi-

fied by the PTM 289:

...+142

K↑(A/S/Q)XK... (a)

For KXXK motifs which are separated by more than five amino acid residues from

each side, the C-terminal lysine becomes dimethylated (PTM 175):

...KXXK .......>5 aa

KXX+28

K↑.......

>5 aaKXXK... (b)

Next, if a single lysine is located close to a KXXK motif (i. e., separated by one or two

amino acids from KXXK), both lysines of the adjacent KXXK are modified by the PTM 289:

...KXX+142

K↑

...1-2 aa

K ...1-2 aa

+142

K↑XXK... (c)

Finally, if two KXXK motifs are separated by less than six amino acids, terminal lysine

residues in both KXXK motifs are modified by PTM 333:

...+186

K↑XXK .......

<6 aa

KXX+186

K↑... (d)

88 results and discussion

All the above rules (a)–(d), however, were formulated based on the PTM mapping of a

single protein, silaffin-3 from T. pseudonana (refer to [68]). We therefore compared the

mapping results from Section 3.3.3, to the ones obtained by Sumper et al. (cf. Fig. 3.20a

and 3.20b). Our results were in good agreement with that reported previously, however

our data showed differences for six lysines (out of 25), which were detected as non-

modified residues (marked with a * in Fig. 3.20b). Thus, it can be inferred from our data

that the PTM map in Fig. 3.20b is consistent with rules (a)–(c), while five ε-polyamine-

modified (PTM 289, Fig. 3.20d) and six dimethylated lysines (PTM 175, Fig. 3.20c) in

KXXK repeats comply with the rules (a) and (b) respectively. At the same time, rule (c)

holds only for one KXXK with PTM 289 (out of three covered motifs), whereas three

out of the four mapped lysine residues that conform to the context of rule (d) remain

unmodified. In contrast to the previous study [68], all the peptide identifications in

the current work were confirmed by high-resolution MS/MS spectra. It is possible that

the corresponding modified peptides were not detected in our experiments. However,

lysine PTM profile for tpSil3 protein (displayed in Fig. A.3), where the total content

of PTM 333 (Fig. 3.20e) corresponds to ~2–3 modified lysines (together with PTM 413,

which is converted to PTM 333 by HF-treatment), support our PTM mapping results.

PTM 333 sites mapped to other proteins in the current study also contradict the

rule (d) (B8BYI7, B8C0W5, B8CGS1 from TP; g22685 from CC; refer to Table A.2). However,

a full validation of this rule would require near-complete PTM mapping for all proteins

of interest, which is rarely achievable in large-scale proteomic studies. On the other

hand, the rule (a) defines a specific amino acid context for ε-polyamine modification

(PTM 289), whereas rule (b) represents a relaxed sequence motif, which could be easily

validated, made more exact or even revised if necessary. Hence, we attempted to define

precise consensus motifs for all types of mapped ε-polyamine modifications using a

substantially larger proteomic dataset. Taking into account that many PTMs may also

be present at nonconsensus sites, the conservation of determined sequence motifs for

polyamine modifications was explored across the three distinct diatom species. Finally,

we investigated the interplay between different types of PTMs.

In order to define consensus sequences for ε-polyamine modifications, the short

amino acid stretches flanking PTM sites need to be aligned and the frequency of each

amino acid residue need to be evaluated (e. g., by ‘Sequence Logo’ [196]; refer to Sec-

tion 5.15). However, there are a number of pitfalls that have to be avoided during

3.3 ptm localization and discovery of consensus motifs 89

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(a) Lysine PTM map of tpSil3 obtained by Sumper et al. [68]

EGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMP

DEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAE

DVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKE

SSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

? ? ? ?

? ?

? ?

* * * *

* *

*

(b) Lysine PTM map of tpSil3 obtained in the current study+0

K - unmodified;?K - unmapped;

*K - differences

(c)+28

K (PTM 175) (d)+142

K (PTM 289)

+

(e)+186

K (PTM 333)

Figure 3.20 Comparison of two PTM maps of silaffin-3 from T. pseudonana (tpSil3): (a) ob-tained by Sumper et al. [68]; (b) obtained in the current study; (c)–(e) color-coding used forlysine PTMs.

this procedure. Firstly, the length of the sequence stretch should be reasonably short.

Therefore, the length of flanking stretches was limited to 20 amino acids (10 down-

stream and 10 upstream), because wider window will increase the chance of finding

more than two modifiable amino acids in one motif, significantly complicating the

analysis. Secondly, it was not always possible to unambiguously assign PTMs to some

KXXK lysine pairs due to lack of corresponding overlapping peptides or incomplete

fragment series. Furthermore, 25 lysine residues have ambiguous modification sta-

tus, where two different PTMs can affect the same residue (e. g., dimethylation and

trimethylation). Therefore, in case of both ambiguous assignments and lysine residues

bearing multiple modifications, we considered all possible sequence variants that were

supported by the obtained data. Finally, lysine residues in detected peptides were con-

sidered non-modified, when the corresponding unmodified peptide was detected in

absence of the modified counterpart. The lysines situated before tryptic peptides or at

90 results and discussion

Aligned to unmodified lysine↓

logo 24

HLMDFQTIRYCGNEPVKAS

CFQPLNRTYIEVAKDGS

CITMNRFQVYDLPESAKG

CQFMNYDRIPTKGELVAS

WQYCNEFMRAPVIKLDSGT

HRQYVDPFGIMNAKSTEL

HRFIQNDMTKVPELCSAG

HRLMQTDFIPYACEVNSGK

CHKRFNQMPVIYETDGLAS

KQRCFETNPVSLYIDGAKCKMQR

HWFLNYITVEADPGS

MNQPAKCDIYLFGVTSE

PQFINCTVYEDGASK

QMPRCVDFINYEKLTASG

HRCFNYMTKPVILGDESA

MNQYFRLDGPIKVESAT

HRMIKPVYNQCFSTLEDGA

FYCENQVHMILPSKTAGD

LCHQFIMRNDTYKEGVAS

CDFMRGYIQLEKNPTVSA

24

TP_B5YNQ3_344 VSRLRRLKDDKGDEAVEESIVTP_B8BRK6_43 HSISMSMHSSKAEKQAIEAAVTP_B8BRK6_65 EDVAGPAKAAKLFKPKASKAGTP_B8BRK6_68 AGPAKAAKLFKPKASKAGSMPTP_B8BRK6_70 PAKAAKLFKPKASKAGSMPDETP_B8BRK6_73 AAKLFKPKASKAGSMPDEASATP_B8BRK6_93 AKSAKMSMDTKSGKSEDAAAVTP_B8BRK6_96 AKMSMDTKSGKSEDAAAVDAKTP_B8BRK6_142 VTEMSMAKAGKDEASTEDMCMTP_B8BRK6_164 FAKSDKEMSVKSKQGKTEMSVTP_B8BRK6_200 AAKIFKGKSGKSGSLSMLKSETP_B8BRK6_208 SGKSGSLSMLKSEKASSAHSLTP_B8BSN6_192 KEAYVELFTTKYNVRDAVPDLTP_B8BSN6_182 YLEPLMGPLKKEAYVELFTTKTP_B8BSN6_399 TKKSTTLAIPKSTPTISLGSTTP_B8BSN6_410 STPTISLGSTKSTATDSSLKPTP_B8BSN6_74 TKGRNAGKIVKLVNDVVLDRQTP_B8BYI7_61 EEVEYIMSDGKAGKLPYGGSTTP_B8C0W5_72 GSGDEEAVDAKAEKTSTTGSATP_B8C0W5_243* AGSSDMSVSSKPEKSEGSSEATP_B8C0W5_246* SDMSVSSKPEKSEGSSEATTATP_B8CC24_430 KPKSPPKDAAKKASTAASFRSTP_B8CGS1_200 DDYSAGADAGKSENYDEEASRTP_B8LBG8_61 EDASRPERLLKSLSFSIELGETP_B8LBG8_389 PALPVEEQMDKIPGGVALLFLTP_B8LDT2_455 LSEGIAVGYAKSSGRSSQQAVCC_g11469_308 SAPVEKESAYKVFSKASL...CC_g11606_152 FVKMLQMIGFKPKKVPFIPYSCC_g11606_284 CPGDSVGLSIKGIAKDEKVEPCC_g11606_288 SVGLSIKGIAKDEKVEPGDIICC_g11606_302 VEPGDIIYVQKEGELKPIKSFCC_g11606_307 IIYVQKEGELKPIKSFTAMVACC_g11606_310 VQKEGELKPIKSFTAMVAVQECC_g11606_405 RIAVMDSNRLKMLGKVTGTATCC_g11606_409 MDSNRLKMLGKVTGTATD...CC_g11606_40 ERGVTIQCNTKEFFTEKYHYTCC_g13975_273 CGYLKGDVGDKSCFEYAACYQCC_g13975_296 ADLGIFNVGYKSCIGRGSCEYCC_g13975_632 RLRFLQESEYKTSAVLFEIVSCC_g1484_156 KIGANSCIGNKNCYFLKDATICC_g1484_156 KIGANSCIGNKNCYFLKDATICC_g1484_276 EENQALIGDCKCLGDYICENNCC_g15479_207 KSSKQDMSMGKSFDSKSDKVACC_g15479_603 NIETSAAEEEKLTTSEEISESCC_g15720_277 PQECINNAVDKSYNGCVTASPCC_g22685_354 AKAEKYSKAAKSLSMNEAIKDCC_g22685_363 AKSLSMNEAIKDAKAEKTHSLCC_g25187_16 NYLRCDPATVKSSDKETCNAICC_g25187_20 CDPATVKSSDKETCNAIKHEVCC_g25187_27 SSDKETCNAIKHEVCGKDMSNCC_g25187_33 CNAIKHEVCGKDMSNIDQSYCCC_g25187_149 NYLRCDPATVKSSDKDTCNAICC_g25187_153 CDPATVKSSDKDTCNAIKHDVCC_g25187_160 SSDKDTCNAIKHDVCGKDMSNCC_g25187_173 VCGKDMSNVDKSYCECIGLYGCC_g25187_184 SYCECIGLYGKGTANLRGIMKCC_g3798_241 KEGYGHDGYAKEEYGHDGYDNCC_g3798_145 YYGVVEHFGYKPSYGSSGEHSCC_g3964_541 QIDKVAGLSGKETTAPPFAKVCC_g3964_550 GKETTAPPFAKVYAGASATANCC_g3964_603 PRYIPNQVSLKGPAIAAAIGECC_g3964_643 KGELGLGNSVKSVDAPNNDNGCC_g3964_685 DVFATGSNLYKQLCKDTDGEPCC_g3964_689 TGSNLYKQLCKDTDGEPTTTPCC_g3964_749 LGDGTFLDQDKTSVLIPNDGTCC_g3964_798 RYQLGLGEPGKTAYPTEVDFQCC_g3964_816 DFQVPFFNIAKISSSGSHTVACC_g3964_1115 FSDGEPVTTPKAIKNIQDVKACC_g3964_1118 GEPVTTPKAIKNIQDVKADVECC_g3964_1236 NELDTVAGILKISSSGTQTVACC_g3964_1447 ATALYFSGDPKAVGENTDGNLCC_g749_314 SGSSSNEYGNKYDGYAPAKGYCC_g749_426 YNAIIQCCDDKFGPASFEDGTCC_g749_440 ASFEDGTCLYKDICETVPPSPCC_g749_608 SDGASSGESSKGEGYSGYSQKCC_g7979_688 SPSPTTCEERKWYALSTGDMLCC_g7979_776 VTPSPTVCEDKVFFFDGDVCSCC_g7979_302 KGTSFNVSGSKSDKGASFNVSCC_g8502_533 NNYKGLFGDYKRVTGTTLQKQCC_g8502_99 YVSLTACCNAKFESYARCDFTCC_g8502_62 QAFTANCGPNKPCADGLCCSQTO_K0R7E4_41 VRAGDRCNYPKYDNCSVGPSSTO_K0RIC9_261 INVIGEPVDEKGPIFAKGKEKTO_K0RIC9_267 PVDEKGPIFAKGKEKFAPLHRTO_K0RIC9_287 RSAPTFTEQGKSQEILVTGIKTO_K0RIC9_297 KSQEILVTGIKVVDLLAPYAKTO_K0RIC9_307 KVVDLLAPYAKGGKIGLFGGATO_K0RIC9_321 IGLFGGAGVGKTVVIMELINNTO_K0RIC9_465 GLQERITSTAKGSITSVQAVYTO_K0RIC9_594 VAEVFTGTAGKFVSLADTIKGTO_K0RIC9_603 GKFVSLADTIKGFEEIINGDYTO_K0S7V0_379 SKGTGYGQSDKWQDYDGR...TO_K0S9A6_121 EAKSAKVAEAKPVKEAAAKSATO_K0SQ58_48 VLAPMPGNTLKAGEDERELGSTO_K0SSD7_156 QSKAPEDYTAKITSEAAMQLNTO_K0SUG8_1365 MRGEGFDFLSKDSKASLFPVATO_K0SUG8_2212 IVERLNRYLNKGLTIMTNERETO_K0T463_183 PDNGWESPHDKPYEGIVYGGSConserv. •••••••••••••••••••••︸ ︷︷ ︸ ︸ ︷︷ ︸

Not conserved Not conserved

(a) non-modified lysine+0

K

PTM 175 (Dimethylation)↓

logo 24

LMRTVYDINCEPAKGS

HLRIPYEDNMAKSG

EHINQTVYDFLCMRAGKS

DIPVEHLQAKTMNYGS

ELNQRCHMPKADYGSV

CNRHVAFILSTEKMYDG

FCKLHPVYDSGAHINPRVLSYAFDGKCFLKQVPTYDNGSA

IMRGLPQVDHTKYSNAEKLY

CKQDPTEGVAS

CFHIKLMQRVNTYEGDPSA

ILQRDFVHKTAMENPSYG

FHLNQTYIKVMPADSEG

IRTHLNPQMVYEKADSG

QCLVDEMTGIKANPYS

EIMQRTFLHPVYDNGKSA

FLQVPEMRACDYSGK

EFLNPRTCMIVYDKGSA

CHIPLMNQTVFRYEGADKS

24

TP_B8BRK6_46 SMSMHSSKAEKQAIEAAVEEDTP_B8BRK6_109 DAAAVDAKASKESHMSISGDMTP_B8BRK6_126 SGDMSMAKSHKAEAEDVTEMSTP_B8BRK6_159 DMCMPFAKSDKEMSVKSKQGKTP_B8BRK6_211 SGSLSMLKSEKASSAHSLSMPTP_B8BRK6_225 AHSLSMPKAEKVHSMSA....TP_B8C0W5_148 AGAEVTAKAEKGSDDEGHDAKTP_B8C0W5_341 SMSHYTHGYEKSIFG......TP_B8CC24_431 PKSPPKDAAKKASTAASFRSNTP_B8LBG8_106 KSGKADAKAHKVDEEDLALASTP_B8LDT2_67 RNFYRDDDTRKCSNEATGGIYCC_g11469_150 SLRTVESKAEKLPGGSMSPVACC_g11469_191 SMRTVDAKAQKQQPGSMPPAYCC_g11469_214 SMRTVEAKAEKTPPDGGSMRLCC_g11606_154 KMLQMIGFKPKKVPFIPYSGFCC_g11606_291 LSIKGIAKDEKVEPGDIIYVQCC_g13975_233 TIGDGSCIGYKACYKAQDATICC_g13975_237 GSCIGYKACYKAQDATIGDGSCC_g1484_162 CIGNKNCYFLKDATIGDRSCLCC_g1484_177 GDRSCLYDSIKGAQNSYGYACCC_g1484_193 YGYACAYLQGKVGNDSCHEYACC_g1484_213 AACYQYGDDNKTFNIGNNACQCC_g15479_617 ERDNSFSFSMKTKHALKHRLFCC_g15720_111 IGQNACSSVYKTTVGQGSCNGCC_g22685_133 GELSMMAKVAKEPAMSVGSKACC_g22685_171 PEMSVGAKAEKPAMSVEAKAECC_g22685_171 PEMSVGAKAEKPAMSVEAKAECC_g22685_217 ADASAGAKSEKPASSMPAMSVCC_g22685_244 PAMSVEAKAEKPAMSVEVDAKCC_g22685_266 EKVMSVGKAKKDELSMAKVAKCC_g22685_288 EPSMSISKAAKDEEDESSGSACC_g22685_303 ESSGSAGKTHKVDSQSMPFGGCC_g25187_85 KDKQVLVDLNKDNGGGGGGDGCC_g25187_98 GGGGGGDGGGKSNGGGNNKSDCC_g25187_106 GGKSNGGGNNKSDGGGNNKSDCC_g25187_114 NNKSDGGGNNKSDGGGNNKSDCC_g25187_122 NNKSDGGGNNKSDGGGNKSDGCC_g25187_129 GNNKSDGGGNKSDGGNDNGKNCC_g3798_231 GDGYGHDGYDKEGYGHDGYAKCC_g749_322 GNKYDGYAPAKGYRLGSASFRCC_g8502_526 SAKSDGSNNYKGLFGDYKRVTTO_K0RN71_304 KYGGGKKRKQKSAEPDIDDDETO_K0RU48_146 CGSNAVASATKCRNPQLSCDRTO_K0RWP8_2756 SSATLFVDALKQVVKLCSCPDTO_K0S1R3_110 AGFNEDPPAVKCRNPRPLCDFTO_K0S7V0_276 CGKSGKAKGSKGGYGGYDYGHTO_K0S7V0_310* SKGGYGGDDAKSSKGGYGGYDTO_K0S7V0_313* GYGGDDAKSSKGGYGGYDAKSTO_K0S9A6_049* AAEEDHHGDAKAAKVPAAKSVTO_K0S9A6_052* EDHHGDAKAAKVPAAKSVKAETO_K0S9A6_63 VPAAKSVKAEKAPEEAAFAKSTO_K0S9A6_124 SAKVAEAKPVKEAAAKSAKVATO_K0S9A6_170* SAASSTSVAAKSTKTNPEMYMTO_K0S9A6_173* SSTSVAAKSTKTNPEMYMGIETO_K0S9A6_279 IAKSHKSKTTKEEMEESPGYRTO_K0SQ58_128 KKSGYYPKSDKSYGDYTYSKSTO_K0SQ58_148 SSKSYRDLQSKAPEDYTAKITTO_K0SQ58_68* SKSGYYLFGSKKSYGSKKYGSTO_K0SQ58_69* KSGYYLFGSKKSYGSKKYGSKTO_K0SSD7_111 PKKSGYYHYPKKSGHYPKKSGTO_K0SSD7_112 KKSGYYHYPKKSGHYPKKSGYTO_K0T322_57 MKSGKDAKAEKYTTPEYQGKAConserv. •••••••••••••••••••••︸︷︷︸

AKA↑

PTM 289 (14 out of 62 sequences)

(b) dimethylation site+28

K

+28

K – dimethylation (PTM 175)+42

K – trimethylation (PTM 189)+0

K – non-modified lysine

K – unmapped lysines

* – ambiguous PTM sites

PTM 189 (Trimethylation)↓

logo 24

WYCLNVAEIKRGTPS

INQTVLPYADEFSMKG

ILQTVCEYAFKNMRGS

IADLMCNPVTGS

EHNFTCKMYDGPASV

CIPRTVFLMNYKSADEG

FHKLNPVCDGTYSACHNPTVLRYADGSKFGKLPQRCDENTYVSACGHNPTISKQDLYAEKHI

KMQYLNDAEVSTPG

FILYQVDTGEPSA

CDFHLQENMTYKAGPVS

EHNPQRTIMADFYSG

EFKQNTYDMAPVGS

CDIQALTGKPYMSVE

HLRTYIMDEPVFGKAS

LNQTVYCDEFPRKMAGS

DFHQCETNRVYGKAS

MPCKLTDNGYEVSA

24

TP_B5YNQ3_341 RNGVSRLRRLKDDKGDEAVEETP_B8BSN6_709 LFGGGNASSNKSVSFTPKATSTP_B8CC24_431 PKSPPKDAAKKASTAASFRSNCC_g11469_128 RSNPTFTVLEKVPSMPLAADSCC_g11469_150 SLRTVESKAEKLPGGSMSPVACC_g11469_175 SMRTVEAKAEKTASAGSMRTVCC_g11469_191 SMRTVDAKAQKQQPGSMPPAYCC_g11469_214 SMRTVEAKAEKTPPDGGSMRLCC_g11469_235 AESTPAAKAEKTPADAGSMRTCC_g11469_252 SMRTVDAKAEKLSPGSMPAAVCC_g11469_273 AGETPAPKAEKTPADGASMRSCC_g11469_290 SMRSVDTKAKKHTPGGSMSAPCC_g11469_303 PGGSMSAPVEKESAYKVFSKACC_g11606_359 WKMGKKTGGQKVENPPELSQYCC_g13975_325 NSCNEFYACYKNYGTVSYNSCCC_g13975_237 GSCIGYKACYKAQDATIGDGSCC_g13975_254 GDGSCTGDSIKGVTYYGFSCGCC_g13975_267 TYYGFSCGYLKGDVGDKSCFECC_g1484_162 CIGNKNCYFLKDATIGDRSCLCC_g1484_177 GDRSCLYDSIKGAQNSYGYACCC_g1484_193 YGYACAYLQGKVGNDSCHEYACC_g15479_429 TTFSTDSKADKSPVFSMDAKACC_g15479_483 TSFSMETKADKSPVFSMDTKACC_g15479_517 TLSMPAAKTTKEEVISLSMGYCC_g15720_69 CGSCNGFRACKNAYYSTIGEVCC_g15720_111 IGQNACSSVYKTTVGQGSCNGCC_g15720_197 TGACYVYLEYKGIYTFTVGNNCC_g15720_231 IMIGDNSCNAKEACYSVEANVCC_g22685_81 LFKPAPAKADKGGSMPEVEADCC_g22685_133 GELSMMAKVAKEPAMSVGSKACC_g22685_155 PAMSVGSKAEKPASSMPEMSVCC_g22685_171 PEMSVGAKAEKPAMSVEAKAECC_g22685_171 PEMSVGAKAEKPAMSVEAKAECC_g22685_217 ADASAGAKSEKPASSMPAMSVCC_g22685_244 PAMSVEAKAEKPAMSVEVDAKCC_g22685_276 KDELSMAKVAKMEPSMSISKACC_g22685_288 EPSMSISKAAKDEEDESSGSACC_g22685_303 ESSGSAGKTHKVDSQSMPFGGCC_g22685_348 VFSLHDAKAEKYSKAAKSLSMCC_g3964_166 RKSGDSNSALKISGRGKKQSNCC_g7979_107 NVNVSGSKSDKGTGINVEGGACC_g7979_305 SFNVSGSKSDKGASFNVSGSKCC_g8502_526 SAKSDGSNNYKGLFGDYKRVTTO_K0R8C7_118 IQTSAEDTSLKGFSSSQAKHATO_K0S7V0_273 KGGCGKSGKAKGSKGGYGGYDTO_K0S7V0_310* SKGGYGGDDAKSSKGGYGGYDTO_K0S7V0_313* GYGGDDAKSSKGGYGGYDAKSTO_K0S9A6_63 VPAAKSVKAEKAPEEAAFAKSTO_K0SAX6_494 SKVDAKASEQKPEAAVETKVETO_K0SSD7_111 PKKSGYYHYPKKSGHYPKKSGTO_K0SSD7_112 KKSGYYHYPKKSGHYPKKSGYTO_K0SUG8_1368 EGFDFLSKDSKASLFPVAFGSConserv. •••••••••••••••••••••︸︷︷︸

AKA↑

PTM 289 and 333 (13 out of 52 sequences)

+

(c) trimethylation site+42

K

+142

K – ε-polyamine (PTM 289)+186

K – ε-polyamine (PTM 333)+85

K – ε-polyamine (PTM 232)

Figure 3.21 Graphical representations of the local protein contexts of modified lysines ±10 residues. Seefull description on p. 91.

3.3 ptm localization and discovery of consensus motifs 91

PTM 289 (2 propylamine units)↓

logo1

32

4

LMEYKDAS

RTDGESAAHILVTSKDEMG

HYKPDEVASAFHLPTSVDEGM

CHKLTEDMASR

EHGLMATS

DEHGPSKVFKPQTGMSEDA

DEPTVGSAKL

PQVSTADFNPSVAQTGHEGMPVAKDMYGPQELASVT

FHKNRVYDMQAESP

DHYKEMGSPA

KRQVMAEGDS

EYDMTVKAGS

TIKVYESAGM

GHIQTVEFMDKAPS 1

32

4

TP_B8BRK6_106 KSEDAAAVDAKASKESHMSISTP_B8BRK6_123 MSISGDMSMAKSHKAEAEDVTTP_B8BRK6_139 AEDVTEMSMAKAGKDEASTEDTP_B8BRK6_156 STEDMCMPFAKSDKEMSVKSKTP_B8BRK6_166 KSDKEMSVKSKQGKTEMSVADTP_B8BRK6_169 KEMSVKSKQGKTEMSVADAKATP_B8BRK6_222 ASSAHSLSMPKAEKVHSMSA.TP_B8C0W5_145 ASEAGAEVTAKAEKGSDDEGHTP_B8C0W5_216* SDEATTSDASKATKVFKSSGKTP_B8C0W5_219* ATTSDASKATKVFKSSGKSGKTP_B8C0W5_243* AGSSDMSVSSKPEKSEGSSEATP_B8C0W5_246* SDMSVSSKPEKSEGSSEATTATP_B8CGS1_263* DESYGDSGDSKAGKAEAGYGDTP_B8CGS1_266* YGDSGDSKAGKAEAGYGDDYGCC_g11469_147 DSGSLRTVESKAEKLPGGSMSCC_g11469_147 DSGSLRTVESKAEKLPGGSMSCC_g11469_172 DSGSMRTVEAKAEKTASAGSMCC_g11469_188 SAGSMRTVDAKAQKQQPGSMPCC_g11469_188 SAGSMRTVDAKAQKQQPGSMPCC_g11469_211 YAGSMRTVEAKAEKTPPDGGSCC_g11469_211 YAGSMRTVEAKAEKTPPDGGSCC_g11469_232 MRLAESTPAAKAEKTPADAGSCC_g11469_249 DAGSMRTVDAKAEKLSPGSMPCC_g11469_270 AAVAGETPAPKAEKTPADGASCC_g22685_168 SSMPEMSVGAKAEKPAMSVEACC_g22685_168 SSMPEMSVGAKAEKPAMSVEACC_g22685_273 KAKKDELSMAKVAKMEPSMSICC_g22685_300 EEDESSGSAGKTHKVDSQSMPCC_g22685_300 EEDESSGSAGKTHKVDSQSMPCC_g22685_345 SAKVFSLHDAKAEKYSKAAKSTO_K0RIC9_565 LGMDELSEDDKLVVSRARKVQTO_K0S9A6_049* AAEEDHHGDAKAAKVPAAKSVTO_K0S9A6_052* EDHHGDAKAAKVPAAKSVKAETO_K0S9A6_60 AAKVPAAKSVKAEKAPEEAAFTO_K0S9A6_60 AAKVPAAKSVKAEKAPEEAAFTO_K0S9A6_170* SAASSTSVAAKSTKTNPEMYMTO_K0S9A6_173* SSTSVAAKSTKTNPEMYMGIEConserv. •••••••••••••••••••••︸ ︷︷ ︸

(A/S)K(A/S)EK

(d) m/z 289+142

K+28

K – dimethylation (PTM 175)+42

K – trimethylation (PTM 189)+0

K – non-modified lysine

K – unmapped lysines

* – ambiguous PTM sites

PTM 333 (2 propylamine units)↓

logo1

32

4

DMYAES

ADEFGTVYS

DGHISTEM

ADEMYPS

KSTEGD

ASTVDM

AGSDGVKDPSGATAGSKG

LVAAFGPQSTE

AMSYKALVGPS

ESAFGY

AISKMG

DEGHKPTS

DKYGSV

ADSEYG

DISAKG 1

32

4

TP_B8BRK6_62 AVEEDVAGPAKAAKLFKPKASTP_B8BYI7_64 EYIMSDGKAGKLPYGGSTVDITP_B8C0W5_192 EAGSDMSVSSKAQMSYIHGSGTP_B8C0W5_216* SDEATTSDASKATKVFKSSGKTP_B8C0W5_219* ATTSDASKATKVFKSSGKSGKTP_B8CGS1_263* DESYGDSGDSKAGKAEAGYGDTP_B8CGS1_266* YGDSGDSKAGKAEAGYGDDYGTP_B8CGS1_320 MFHDKSGKGGKGSSSGGEGYGCC_g22685_168 SSMPEMSVGAKAEKPAMSVEACC_g22685_168 SSMPEMSVGAKAEKPAMSVEAConserv. •••••••••••••••••••••︸︷︷︸

SKA

+

(e) m/z 333+186

K

PTM 232 (1 propylamine unit)↓

logo1

32

4

AGIPRSV

CGIRTV

DEFGKPY

AEGSTD

EGVYAAFRSYP

AGLRVE

AGLVYK

EKMSYG

ADEGLSKA

GPVSATGEEGHPYK

GRSTYE

KLRDS

DIMFG

AEYGEKMNRS

ADKNPSV

AEKNYR 1

32

4

TO_K0RCW9_108 VRPSAPGYEDKPEERRGGSPETO_K0RHV4_222 GIFDVFRKKLKSTGGSFYKARTO_K0RWT0_323 RTGTAPVVMEKGEPESIAMVRTO_K0S9A6_132 PVKEAAAKSAKVAHEDMGESATO_K0SQ58_59 AGEDERELGSKSGYYLFGSKKTO_K0SSD7_150 SCDAYYEAYGKSGKTKGGRNNTO_K0T463_198 IVYGGSLGGSKAEKSDDENDYConserv. •••••••••••••••••••••︸ ︷︷ ︸

Not significant (7 sequences)

(f ) m/z 232+85

K+142

K – ε-polyamine (PTM 289)+186

K – ε-polyamine (PTM 333)+85

K – ε-polyamine (PTM 232)

Figure 3.21 Graphical representations of the local protein contexts of modified lysines ±10residues (continued from p. 90). Sequence logo plots represent relative amino acid frequenciesfor ±10 amino acids from the lysine PTM site. The total height of the stack of letters at eachposition shows the sequence conservation, while the relative height of each letter shows therelative abundance of the corresponding amino acid [196]. Positively and negatively chargedresidues are shown in blue and red respectively, uncharged residues are green, hydrophobicresidues are black, and S/T/Y residues are highlighted in orange.

92 results and discussion

their C-termini were also mapped as non-modified lysines, because trypsin does not

cleave after modified lysine residues.

This all considered, the local contexts of lysine PTM sites were investigated in all

three diatom species. However, prior to the alignment of modified residues, the conser-

vation of amino acids surrounding non-modified lysines was checked. As displayed

in Fig. 3.21a, no sequence conservation was observed in short sequence stretches with

non-modified lysine residues. The local contexts of methylated lysines (ε-N,N-di- and

ε-N,N,N-trimethylation, denoted also as PTMs 175 and 189 respectively, see Fig. 3.21b–

3.21c), on the other hand, demonstrate the prevalence of C-terminal lysines in KXXK

motifs as methylation target. Of the total 93 mapped methylated residues, 21 bear ei-

ther di- or trimethylation in C. cryptica and T. oceanica species (but not in T. pseudonana),

which is consistent with the relative abundance of both ε-methylated lysines in total

hydrolysates (see Fig. 3.10b from Section 3.2.3). The -3 position from the methylation

site contains a markedly conserved lysine residue that is often flanked by alanine re-

sidues (AKA). Lysines at the -3 position, as well as the N-terminal lysine in KXXK, are

often modified by an ε-polyamine chain (~25 % of aligned sequence stretches). Such a

conclusion was further corroborated by the alignment of ε-polyamine-modified lysines,

which are shown in Fig. 3.21d–3.21f. The PTM 333 (ε-polyamine with two propylamine

units and quaternary amine) is often present within an SKA consensus site (4 out of

8 in Fig. 3.21e) and affects N-terminal lysine in KXXK. However the small number of

sequences (altogether 8 mapped sites) do not allow us to draw strong conclusions. A

similar situation is observed for the T. oceanica specific PTM 232 (ε-polyamine with one

propylamine unit, 6 mapped sites). Amino acid residues that surround ε-modification

sites (PTM 289) display some degree of conservation ((A/S)K(A/S)EK, Fig. 3.21d).

Next, we wanted to test whether the identified motifs were conserved across all

the species, or if different organisms displayed differences in the amino acid context.

For this purpose, sequences with mapped PTM sites were grouped by the species

from which they derive. Sequence logos resulting from these alignments are shown in

Fig. 3.22. These results clearly indicate that both methylated and polyamine-modified

lysines are found in KXXK boxes at the C- and N-terminal positions respectively. How-

ever, sequence context of PTM 289 site in T. oceanica differed from the overall picture

of polyamine-modified lysines, which could be affected by the small sample size of

3.3 ptm localization and discovery of consensus motifs 93U

nmod

ified

lysi

nere

sidu

e(+

0 K)

K+0

(TP)

13 24

HIRYDKPVFGLETSA

CMPREILTFNAKSDG

DINTLRVAMPYEGSK

FGIKMQVWYPTDREALS

AELPRYFIKMSTDNVG

CFGLRVIPTAMDKES

CIKNQRTVWYDFMPSEGAL

HIPTELMQRVFGASDK

GHKPYMVITEFLDSA

EIKNQFPCALSTVDG

KFKNRYEIPQTVDLGAS DINRWFLSAKPGVYTE ILPQRYNTADEVGSK CNDEFMRVYGLPQASTCFGRILMPTANKVDES

HKNRVDFLMYGPEITAS

CHKMTYFINPQVDGASEL

CFHIQRVWYEMPLSTAGKND

CMYFGILEPRDKTAS

CFIMSYKQRPDELTAGV

13 24

conserv.

•••••••••••••••••••••

︸︷︷

︸no

tcon

serv

ed(47

sequ

ence

s)

Dim

ethy

latio

nsi

te(+

28 K)

K+28

(TP)

13 24

KPRADSAHKNSGM

CDFGASAEHKPYLM

AHRYPVS

FKSTDM

HLPSDAADGK TYSA DKRSHE

KCGQSEVA ADEHIMSADEFHINTS

DGVAMEEHKDSA

GILTVAS

AFGHKLSTV

DLQRSEG

DEGISAM

DMNPYKS

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

AK

(A/S

)EK

A(11

sequ

ence

s)

Trim

ethy

latio

nsi

te(+

42 K)

K+42

(TP)

13 24

LPRFKNSG GPV

GPS KNRADL ARS

ARS KLNKADS DSV

KSTAFGADTEPSAFKARVESTENS

13 24

︸︷︷

︸N

otsi

gnifi

cant

(3se

quen

ces)

PTM

289

site

(+14

2K

)↓

K+142

(TP)

13 24

DMYKSADGTES

ITMDSE

KVYDASAEHMDTVG

CEKTMSAD

AELMS DGPSKV

FKPQSTDAM

EPTGSAKPQTVSA DFH

STGEAGMK DTAEGVS FH

MVYSEDEHKMSGA

AEGKVMDS

AEKVYDTSIKSTVYAEG

GHSTADK

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

(A/S

)K(A

/S)E

KS

(14

sequ

ence

s)

PTM

333

site

(+18

6K

)↓

K+186

(TP)

13 24

DMSYAE

ADEFGTVY

DGHISTE

ADEMYSKSTGDAMSTVDAGS DVGK DGPSA ATGS KGLVA AEFGPQST

AMSYK ALVGSESFGY

AISKGDEGHKPST

DKVYGS

ADSYGDISKG

13 24

conserv.

•••••••••••••••••••••

︸︷︷

︸SK

A(S

/G)K

(A/V

)XK

(8se

quen

ces)

(a)A

lignm

ents

forTP

prot

eins

Unm

odifi

edly

sine

resi

due

(+0 K)

K+0

(CC)

13 24

ILMTDEFGPQRYVACNKS

CFLAPQRTKNVDESYGI

CNRTKQFELYADPSG

CFMPQYDNRISTEGAKLV

FMRYANQVCEIKLPDGST

PEHKMYGQSADFIVLNT

HTADFILMQNVPEKSGC

DFMPQTYAIKLVCESNG

CFMQRNYIPTVDLEASG

EFQRTCPSVLNAGIYD

KCLMQRWYAFHINTGPVEDS KMPAGDVLYCEFIST EINPQFVTCDYGKSA IMPCTDFLVYEANSKGFHMRCKNVYIPTDLSGEA

LNQADRFGVEIPSKT

HPSVEFLNQYIKTCGAD

LQYEGPCKMNSAHIVTD

CHIDFMQNTAGKEYSV

CDFGKYELVQAINPTS

13 24

conserv.

•••••••••••••••••••••

︸︷︷

︸no

tcon

serv

ed(55

sequ

ence

s)

Dim

ethy

latio

nsi

te(+

28 K)

K+28

(CC)

13 24

CILTYAKNPESG

KLPRIAEMSDGN

ADINQVYCLSMGRK

ADIGKMQNTYSKNQACMDGSV

CFHMNSVADLYEIG

KVCDYSAG

ILNSYAFDGKCFKLPQTYDVGSNA

DHIKLMPQGAYNE

KKLQAEGVDPTS CFKLTVYGNQEPDA

ADEFHLQRTVNSYMPG

FKLMNQAIDSGAEHILNQTDMVSG

ACKQVIPSYDEGMN

DIRVYHPKASGN

CEFVADMPRGYKS

FLNRTYCDKGVSA

CINRSTVYEFLQDKAG

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

(G/A

)KA

EKX

(30

sequ

ence

s)

Trim

ethy

latio

nsi

te(+

42 K)

K+42

(CC)

13 24

KLVWYCEINRAGTPS

IPTVYKLAFDESGM

ILQACEFKYGNMRS

AICLMPVNGTSEFGHNYKTACDMPSV

CIKLPTVYFMNSADEG

FKNPCGTYSACNPRVYADGLSK EFGLQYCNTVSACGHKTIQLADYEKAHIMQYDLNSEVTPG ILVYDGQTESPA C

DEFKQNTYAGMPVS

NPQRTAFIMYDGS

EFKQTYADNMPVGS

ACDILPTYKGSVEM

HLRYDFIKMEGPVAS

FLNQTYCDEPRKAGMS

DEFQTYCVNRGKSA

DMPCKLNTGSYEVA

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

(A/S

)KA

(E/D

)KX

(41

sequ

ence

s)

PTM

289

site

(+14

2K

)↓

K+142

(CC)

13 24

AKMEYDS

RESA LVDKMGKVAEPS DFGLSEMEMSR GLST HPSV M

GADEPGSA KVTA AHQE KMYPQVLT EDQSAP

KAGMSPAQSDG

MAVGSAKESGM

IMAPS

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

AK

AEK

X(16

sequ

ence

s)

PTM

333

site

(+18

6K

)↓

K+186

13 24

SS MPEMSVG A

KAEKPAMSV E

A13 24

︸︷︷

︸N

otsi

gnifi

cant

(1se

quen

ce)

(b)A

lignm

ents

forCC

prot

eins

Unm

odifi

edly

sine

resi

due

(+0 K)

K+0

(TO)

13 24

ADHLMRSYEIKPGV

GMWDKNRTYLSVA

CFNTDEKRSVAGQL

AFRYGILPSTDEV

AKMPRVWFIQTDSGL

ADKNRYEFPTISGL

CFKLQSYPRDTVGA

EFQYGLPSVADINT

EFGHKNPQSVDLYAT

KNREIPSTDALG

KFHMNWYPRSTVADG DERYFGSKPQLV FGLMNPSYADIKTVE

CFGQRYAKNTVDELI

CEFQRTGMDISLA

EKQRGSDILTVA

CEMQVYAGKLNPTDFR

ADMQRSVEIKTYLGP

EHIQDKNRALSVG

EGHMNPQRTLVYAKS

13 24

conserv.

•••••••••••••••••••••

︸︷︷

︸N

otco

nser

ved

(29

sequ

ence

s)

Dim

ethy

latio

nsi

te(+

28 K)

K+28

(TO)

13 24

EGIMPVACKS

DPYGASK

EFHTAGSK

AEHTVYNGS

EHLSVADKGY

AEFGHLRSTVKYD

DFGKYHPSVAHLPRSVYDGKGKQTYDPSADLPQEKVSTA

KKQTVYCEGAS ENVRSTYAGP

HMPTVANYEGK

HKSTVYADEGP

ADLMNQREPGKY

CILAKTPSY

EFGLMPQVKSYAD

DMYASCGK

APVGIDSYK

FGHKMTVYADRES

13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

AK

AX

KX

(20

sequ

ence

s)

Trim

ethy

latio

nsi

te(+

42 K)

K+42

(TO)

13 24

EGIPVKS

PQYGKAFKSTVGACDSGDFKAGY

DEGLSKY

DGHVYAS

DGHSTYKAEKPYDS

EKLPQAS

KKPASG EFPGSAEGHLSYK

AEFHSYGASVYPGAEKPQVYDFTAGK

FSAGKHSVGKY

AEGYDS13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

XK

XX

KG

S(9

sequ

ence

s)

PTM

289

site

(+14

2K

)↓

K+142

(TO)

13 24

ELSA DGSA AEHMTK

DEHSV

DEGSVP

DHLTAHSA EGVK ADS DTVA KLSTVA ANPTVE

APVK ESTVAKMNRPPSYAE

AMRVEGMAK

ISVYA MQVEF13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

(A/S

)K(S

/D)(

A/V

)K(A

/V)(7

sequ

ence

s)

PTM

232

site

(+85 K

)↓

K+85

(TO)

13 24

AIPRSTVG

CGIRTVY

DEFGKPSTY

AEGDST

EGVAYACFRSTYP

AGLRVES

AGLVYKCEKMYGS

ADEGLNSKAGPRVS ACNTGE

EGHPRYKGSTYERCLRDKS

DIMRYFG

AERYG DEMNRKS

AKNPSVYD

ADEGKNYR13 24

conserv.

•••••••••••••••••••••

︸︷︷︸

SKSE

KX

(9se

quen

ces)

(c)A

lignm

ents

forTO

prot

eins

Figure

3.22

Gra

phic

alre

pres

enta

tion

sof

the

loca

lpr

otei

nco

ntex

ts,

alig

ned

byPT

Msi

tes

sepa

rate

lyfo

rea

chdi

atom

spec

ies

(TP

,T.

pseu

dona

na;CC

,C

.cry

ptic

a;TO

,T.

ocea

nica

).T

hem

odifi

edly

sine

resi

due

ism

arke

dw

ith

arro

w.

Con

sens

usse

quen

ce(i

fan

y)an

da

num

ber

ofal

igne

dse

quen

ces

ispr

ovid

edbe

low

each

sequ

ence

logo

.

94 results and discussion

aligned sequences (total seven stretches, see Fig. 3.22c). Nevertheless, 88 % of modified

lysines mapped in this study were embedded into defined KXXK motifs, with biases for

certain lysine-flanking amino acid residues (putative motifs are demonstrated below

each sequence logo in Fig. 3.22a–3.22c). In order to further investigate the association

between PTMs in KXXK boxes, short sequence stretches were aligned by the entire lysine

pairs.

Specific properties of lysine methylation and ε-polyamination were further investi-

gated by studying their crosstalk. As demonstrated in Fig. 3.21 and 3.22, 114 out of the

130 total mapped modified lysines that were detected in this study, resided in repeat

KXXK. However, it is clear that these PTMs are not strictly dependent on each other,

because there are counter-examples for the both cases, when either ε-polyamination

or methylation occurs in KXXK alone (five and eight cases respectively, Fig. 3.21). We

therefore examined the statistical evidence for association between ε-polyamines and

methylation present in KXXK motifs. To this end, all sequences containing this motif

were extracted and aligned by the N-terminal lysines. The analyzed sequences are

presented in both graphical (as sequence logos) and text form in Fig. 3.23. Firstly, we

counted the number of either ε-polyaminated or methylated lysines at I or II position

in KXXK motif, and analyzed their distribution for non-random modification patterns

using a Fisher’s exact test (Table A.3). Here we determined a major bias towards methy-

lation at C-terminal lysine in KXXK block (with the prevalence of dimethyllysine marks

in T. pseudonana and trimethyllysine in C. cryptica), whereas polyamination occurred

mostly at N-terminal lysine in KXXK (PTMs 289 and 333). This correlation is statistically

significant for T. pseudonana and C. cryptica (Tables A.3b and A.3d), while the sample

size is too small to draw a conclusion about a crosstalk in T. oceanica (Fisher’s exact

test, Table A.3f). Lysines modified with ε-polyamines with two propylamine units are

flanked by either alanine or serine residue (A/S)K(A/S) in T. pseudonana and C. cryptica

(see Fig. 3.22a and 3.22b), whereas the little number of sequences for T. oceanica is not

enough to properly define a consensus sequence (Fig. 3.22c):

(A/S)(+142

K /+186

K )(A/S)E+28

K (T. pseudonana)

(A/S)(+142

K /+186

K )(A/S)E(+28

K /+42

K ) (C. cryptica)

A(+142

K /+85

K )XE(+28

K /+42

K ) (T. oceanica)

3.3 ptm localization and discovery of consensus motifs 95

Therefore, three diatom species comparison, provided in Fig. 3.23, revealed the po-

tential PTM crosstalk in a large number of polyamine-modified proteins (e. g., B8BRK6

and B8C0W5 from T. pseudonana, G11469 and G22685 from C. cryptica), which occurs be-

tween methylation and ε-polyamines. The defined consensus site (A/S)K(A/S) for ε-

polyamines in T. pseudonana and C. cryptica demonstrates, that biosilica-associated pro-

teins are modified in a similar way in these closely-related species. The discovered

interplay between ε-polyaminated and methylated lysines may indicate the presence

of recognition domains in the corresponding PTM enzymes, similar to Tudor domains

for histone methylases [100], which could facilitate binding of silaffins already possess-

ing methylated lysine residues. However, this potential crosstalk remain to be further

investigated, and to demonstrate its biological relevance it is necessary to examine in

vivo the effect of methylation on lysine ε-polyamination (or vice versa).

96 results and discussionT.pseudonana

PTM289

(15outof26

sequences)↓

logo

1 32 4

HMRD E G K P S AA KMN V YD G T E S

H D GIK V T E S

ME K V Y D A S

FH K Y A EMST D G

C GHIKP R V ET DSAM

DEK A L M SAFHMP RD G K S V

F PR T KMS A D

G L P T S AKD GLQ P V S AA D H S T F E GK

DPQE LT G V A S

A G H KMPD F S E

HIMT YDE K G A S

DP T V K A E GM S

EHP T A D V Y G K S

DILV E S A G

G HL ST V E A D K

1 32 4

TP_B5YNQ3_341

RNGVSRLRRLKDDKGDEAVEE

TP_B8BRK6_43

HSISMSMHSSKAEKQAIEAAV

TP_B8BRK6_62

AVEEDVAGPAKAAKLFKPKAS

TP_B8BRK6_65

EDVAGPAKAAKLFKPKASKAG

TP_B8BRK6_70

PAKAAKLFKPKASKAGSMPDE

TP_B8BRK6_93

AKSAKMSMDTKSGKSEDAAAV

TP_B8BRK6_106

KSEDAAAVDAKASKESHMSIS

TP_B8BRK6_123

MSISGDMSMAKSHKAEAEDVT

TP_B8BRK6_139

AEDVTEMSMAKAGKDEASTED

TP_B8BRK6_156

STEDMCMPFAKSDKEMSVKSK

TP_B8BRK6_166

KSDKEMSVKSKQGKTEMSVAD

TP_B8BRK6_208

SGKSGSLSMLKSEKASSAHSL

TP_B8BRK6_222

ASSAHSLSMPKAEKVHSMSA.

TP_B8BYI7_61

EEVEYIMSDGKAGKLPYGGST

TP_B8C0W5_72

GSGDEEAVDAKAEKTSTTGSA

TP_B8C0W5_145

ASEAGAEVTAKAEKGSDDEGH

TP_B8C0W5_216*

SDEATTSDASKATKVFKSSGK

TP_B8C0W5_216*

SDEATTSDASKATKVFKSSGK

TP_B8C0W5_219*

ATTSDASKATKVFKSSGKSGK

TP_B8C0W5_219*

ATTSDASKATKVFKSSGKSGK

TP_B8C0W5_243*

AGSSDMSVSSKPEKSEGSSEA

TP_B8C0W5_243*

AGSSDMSVSSKPEKSEGSSEA

TP_B8CGS1_317

GYHMFHDKSGKGGKGSSSGGE

TP_B8CGS1_263*

DESYGDSGDSKAGKAEAGYGD

TP_B8CGS1_263*

DESYGDSGDSKAGKAEAGYGD

TP_B8LBG8_103

PMTKSGKADAKAHKVDEEDLA

Conserv.

•••••••••••••••••••••↑

Dim

ethylation(8

outof26sequences)

(a)(A/S)(+142

K/+186

K)(A/S)E+28

K

+142

K–

ε-polyamine

(PTM

289)+186

K–

ε-polyamine

(PTM

333)+85

K–

ε-poly amine

(PTM

232)

C.cryptica

PTMsite

m/z

189(Trim

ethylation)↓

logo

1 32 4

ILMFK Y E D S AGHINR T V A E S K

AIL ST Y E VDM G K

KMN V GLT A E P S

GQT VK A DF P L SME

IPQVE FKLN AM R S

K NP E G LVM T S

EHIK G T AMP S VPT V YMID A E G

FILT P G S AK

D P T V S AT HIK Q D A EK

AMNYQE G LDSV T P

FGILT D QS E A P

F LQT K E G V AMP S

TQ A FIM G D S

D H A P E N GM V S

ILY A K G P V S EM

KLMDE GIV P A S

1 32 4

CC_g11469_147

DSGSLRTVESKAEKLPGGSMS

CC_g11469_147

DSGSLRTVESKAEKLPGGSMS

CC_g11469_172

DSGSMRTVEAKAEKTASAGSM

CC_g11469_188

SAGSMRTVDAKAQKQQPGSMP

CC_g11469_188

SAGSMRTVDAKAQKQQPGSMP

CC_g11469_211

YAGSMRTVEAKAEKTPPDGGS

CC_g11469_211

YAGSMRTVEAKAEKTPPDGGS

CC_g11469_232

MRLAESTPAAKAEKTPADAGS

CC_g11469_249

DAGSMRTVDAKAEKLSPGSMP

CC_g11469_270

AAVAGETPAPKAEKTPADGAS

CC_g11606_152

FVKMLQMIGFKPKKVPFIPYS

CC_g11606_288

SVGLSIKGIAKDEKVEPGDII

CC_g11606_307

IIYVQKEGELKPIKSFTAMVA

CC_g15479_426

AEKTTFSTDSKADKSPVFSMD

CC_g15479_483

AEKTSFSMETKADKSPVFSMD

CC_g15479_514

EKETLSMPAAKTTKEEVISLS

CC_g22685_78

ATKLFKPAPAKADKGGSMPEV

CC_g22685_130

AEKGELSMMAKVAKEPAMSVG

CC_g22685_130

AEKGELSMMAKVAKEPAMSVG

CC_g22685_142

AKEPAMSVGSKAEKPASSMPE

CC_g22685_168

SSMPEMSVGAKAEKPAMSVEA

CC_g22685_168

SSMPEMSVGAKAEKPAMSVEA

CC_g22685_168

SSMPEMSVGAKAEKPAMSVEA

CC_g22685_168

SSMPEMSVGAKAEKPAMSVEA

CC_g22685_214

SKVADASAGAKSEKPASSMPA

CC_g22685_214

SKVADASAGAKSEKPASSMPA

CC_g22685_241

AEKPAMSVEAKAEKPAMSVEV

CC_g22685_241

AEKPAMSVEAKAEKPAMSVEV

CC_g22685_263

AKAEKVMSVGKAKKDELSMAK

CC_g22685_273

KAKKDELSMAKVAKMEPSMSI

CC_g22685_285

AKMEPSMSISKAAKDEEDESS

CC_g22685_285

AKMEPSMSISKAAKDEEDESS

CC_g22685_285

AKMEPSMSISKAAKDEEDESS

CC_g22685_300

EEDESSGSAGKTHKVDSQSMP

CC_g22685_300

EEDESSGSAGKTHKVDSQSMP

CC_g22685_345

SAKVFSLHDAKAEKYSKAAKS

CC_g22685_351

LHDAKAEKYSKAAKSLSMNEA

CC_g22685_363

AKSLSMNEAIKDAKAEKTHSL

CC_g3964_1115

FSDGEPVTTPKAIKNIQDVKA

CC_g7979_104

DNINVNVSGSKSDKGTGINVE

CC_g7979_302

KGTSFNVSGSKSDKGASFNVS

Conserv.

•••••••••••••••••••••↑

Di-/trim

ethylation(36

outof41sequences)

(b)(A/S)(+142

K/+186

K)(A/S)E(+28

K/+42

K)

T.oceanica

PTM289

and232

(4and

3outof19

sequences)↓

logo

1 32 4

G HIME K A S

C G R T V Y K AD PSY E V A G K

A CDIKMP T V E G S

LV A D K P S G Y

F LY A G H S T K

DLV E H A G S

FH PD V Y A K G

C E G LK S A Y D

G P V S AKD P T G A SD N V G A T E SK

IR Y E S A G T V

CDEKST Y A N G P

D HMR T E G L PA Y

D P R F A G ED N P R Y A EM G K

D GMN T V A S YN PQF GMV Y A D

1 32 4

TO_K0RIC9_307

KVVDLLAPYAKGGKIGLFGGA

TO_K0S7V0_273

KGGCGKSGKAKGSKGGYGGYD

TO_K0S7V0_310*

SKGGYGGDDAKSSKGGYGGYD

TO_K0S7V0_310*

SKGGYGGDDAKSSKGGYGGYD

TO_K0S9A6_60

AAKVPAAKSVKAEKAPEEAAF

TO_K0S9A6_60

AAKVPAAKSVKAEKAPEEAAF

TO_K0S9A6_121

EAKSAKVAEAKPVKEAAAKSA

TO_K0S9A6_129

EAKPVKEAAAKSAKVAHEDMG

TO_K0S9A6_276

KTVIAKSHKSKTTKEEMEESP

TO_K0S9A6_170*

SAASSTSVAAKSTKTNPEMYM

TO_K0S9A6_170*

SAASSTSVAAKSTKTNPEMYM

TO_K0S9A6_49*

AAEEDHHGDAKAAKVPAAKSV

TO_K0S9A6_49*

AAEEDHHGDAKAAKVPAAKSV

TO_K0SQ58_125

HYPKKSGYYPKSDKSYGDYTY

TO_K0SSD7_101

GYSTYTSYCSKSNKRCRRKYG

TO_K0SSD7_150

SCDAYYEAYGKSGKTKGGRNN

TO_K0SUG8_1365

MRGEGFDFLSKDSKASLFPVA

TO_K0T322_57

SKAMKSGKDAKAEKYTTPEYQ

TO_K0T463_198

IVYGGSLGGSKAEKSDDENDY

Conserv.

•••••••••••••••••••••↑

Dim

ethylation(12

outof19sequences)

(c)A(+142

K/+85

K)XE(+28

K/+42

K)

+28

K–

dimethylation

(PTM175)

+42

K–

trimethylation

(PTM189)

+0K

–non-m

odifiedlysine

K–

unmapped

lysines

*–

ambiguous

PTMsites

Figure3.23G

raphicalrepresentationsof

thelocal

proteincontexts

ofm

odifiedlysines

situatedin

KXXK

motifs.

Sequencecontexts

arealigned

byN

-terminallysine

inKXXK±

10residues.Sequence

logosare

plottedin

thesam

ew

ayas

inprevious

figures

4 C O N C L U S I O N S A N D O U T LO O K

In this thesis I investigated the profile and site-specificity of lysine ε-polyamine PTMs

in diatom biosilica-associated proteins. The motivation for these experiments stems

from the significance of polyamine structures in biosilicification process. The silica

precipitation activity reported for polyamines in the literature [31, 61–67, 83], sug-

gests that they are involved in the species-specific patterning of diatom biosilica. In

this context, lysine ε-polyamine modifications can play an important role in regula-

tion of biosilica morphogenesis. Therefore, the characterization of the PTM consensus

sites is important for understanding the link between biosilica-associated proteins and

biosilica-forming machinery.

Here we present an integrated analytical strategy for the systematic analysis of ly-

sine polyamine modifications. The employed approach relies on the profiling of ly-

sine PTMs in biosilica hydrolysates prior to the site-specific identification of lysine

ε-polyamines in biosilica-associated proteins. It starts with exhaustive protein hydrol-

ysis in 6 m HCl and is followed by AQC-derivatization of polyamine-modified lysines

and their identification by LC-MS/MS. In this way, we could distinguish structural iso-

mers of polyamine moieties and quantify them with a limited set of internal standards

(Sections 3.1.1–3.1.3).

Using this approach we have catalogued lysine polyamine modifications in proteins

associated with silicified cell walls, which were isolated from three closely-related di-

atom species: Thalassiosira pseudonana, T. oceanica and Cyclotella cryptica (Section 3.2.1).

High resolution MS/MS analysis of these modifications revealed characteristic frag-

ments (Table 3.2) that were subsequently used as reporter ions for modified peptides.

Altogether, we identified 25 polyamine modifications (Section 3.2.3), which not only

confirmed seven previously known PTMs, but also revealed 18 novel ones, includ-

ing three acid-resistant phosphoester-containing polyamines (collectively denoted as

phosphopolyamines in Section 3.2.2). We also observed that the pattern of polyamine

modifications reflects the phylogenetic proximity of the three species (Section 3.2.5),

97

98 conclusions and outlook

where, on one hand, two closely-related diatoms (T. pseudonana and C. cryptica) share

conserved set of polyamine modifications, which, on the other hand, differ substan-

tially from a phylogenetically distant diatom (T. oceanica).

Detected structures represent an unusual class of protein post-translational modifica-

tion, which appears to be unique for biomineralizing organisms. These modifications

occur at the lysine side-chains, where propyleneimine (or aminopropyl) units are lin-

early linked to ε-amines (see Fig. 4.1c–4.1e). The polyamine chains characterized in

all three (centric) diatom species consist of 1–2 aminopropyl units, while the longer ε-

polyamine structures were detected previously in the pennate diatom C. fusiformis [64–

66]. Lysine residues were detected in three methylation states: textmono-, di- and tri-

methylation, whereas each N-atom in ε-polyamine structures can also be methylated.

Additionally, the lysine residue can be converted to δ-hydroxylysine, whose hydroxyl

group can be phosphorylated (3 out the total of 25 PTM structures). As a result, lysine

ε-polyamine modifications introduce to biosilica-associated proteins positive charges

of protonated amino groups along with negatively charged phosphate residues. The

zwitterionic character of these molecules is likely to influence their chemical and biolog-

ical properties; however the mechanistic role of these PTMs remains elusive. Besides di-

atoms, silicified sponge spicules [91, 92] and calcifying coccolithophores [44] also have

similar polyamine structures, although phosphopolyamines seem to be diatom-specific.

Apparently, diatoms have a highly complex PTM machinery, whose site-specificity was

investigated in this work by mapping polyamine modifications to protein sequences.

The profiled lysine modifications were localized at biosilica-associated proteins by

GeLC-MS/MS using the multiple protease digestion approach (Section 3.3.1) after se-

lective removal of O-linked glycans and phosphorylation by HF-treatment (Section 3.3.2).

For the identification of polyamine-modified peptides an iterative search strategy (Sec-

tion 3.3.3) and deconvolution of raw peptide MS/MS spectra (Section 3.3.4) were em-

ployed, whereas polyamine-specific fragments were used for the validation of PTM

assignments (Section 3.3.5). We have identified 150 polyamine-acceptor lysines, which

can be modified by 5 types of PTM marks displayed in Fig. 4.1. Two of them represent

di- and trimethylation (Fig. 4.1a and 4.1b), while the others are ε-polyamines with ei-

ther one (Fig. 4.1c) or two (Fig. 4.1d and 4.1e) aminopropyl units and a different degree

of N-methylation. PTM 333 (Fig. 4.1e) contain quaternary amino group and occur in

T. pseudonana and C. cryptica only, while PTM 232 (Fig. 4.1c) is specific for T. oceanica.

conclusions and outlook 99

These modifications were mapped to 25 biosilica-associated proteins (summarized in

Table A.2) from the three diatom species that are sharing no sequence homology and

are also not homologous to other known proteins. In this way, we have substantially

extended the catalogue of lysine ε-polyamine sites in biosilica-associated proteins and

provided a resource for future studies of site-specificity and functional association of

PTM machinery in diatom biosilica.

(a) PTM 175+28

K(ε-dimethylation)

+

(b) PTM 189+42

K(ε-trimethylation)

(c) PTM 232+85

K (ε-polyamine withone aminopropyl unit)

(d) PTM 289+142

K (ε-polyamine with twoaminopropyl units)

+

(e) PTM 333+186

K (ε-polyamine with twoaminopropyl units and quaternary amine)

Figure 4.1 Structure of mapped PTMs.

Since a primary goal of this study was to reveal consensus sequences for lysine poly-

amine modifications, the conservation of amino acids surrounding the PTM sites was

tested. It was shown that polyamine modifications occurred at several consensus mo-

tifs, despite full sequences of the modified proteins were not conserved. In total, we de-

fined two consensus motifs for ε-polyamines and methylation common to T. pseudonana

and C. cryptica, while the assignments in T. oceanica were inconclusive because of small

sampling size (Section 3.3.6). Out of the total 25 polyamine-modified proteins 21 con-

tained multiple conserved repeats KXXK, and 88 % of mapped PTMs resided in this

sequence motif. It was shown, that methylation commonly occurs at C-terminal lysine

of KXXK, while ε-polyamine modifications reside preferably at N-terminal lysine. In ad-

dition, given the proximity of PTMs in KXXK, we hypothesized that crosstalk between

different modifications may be an important mechanism of the biosilica PTM machin-

ery. The association between ε-polyamines and methylation in KXXK repeats was statis-

tically significant for T. pseudonana and C. cryptica and not significant for T. oceanica.

100 conclusions and outlook

However, it was possible to map only 5 out of total 25 modified lysines detected.

Therefore, as a perspective for the future work, we would like to improve the iden-

tification of lysine modification sites by all-ion fragmentation (AIF) technique using

the catalogued polyamine-specific fragments for peptide-independent identification of

ε-polyamine PTMs. Furthermore, newly emerging alternative proteases [180, 197–199]

are useful to increase the proteome coverage and improve the identification of PTMs,

through the analysis of longer peptides, an approach referred as ‘middle-down’ prote-

omics, which in turn enables a perspective characterization of PTM proteoforms [200,

201]. Our results also open new perspectives for protein functional studies. In or-

der to gain the insight into the biosilica post-translational modification machinery, the

prospective polyamine biosynthetic enzymes should be investigated [99]. To this pur-

pose, the synthesis of initial substrate for transfer of aminopropyl groups is required.

Several polyamine synthases have been already cloned and tested for their possible

activity [97, 98], while many more remain to be discovered and characterized. Finally,

in order to provide a direct mass spectrometric evidence for the PTM crosstalk be-

tween polyamination and methylation, it need to be validated in vivo via site-directed

mutagenesis.

5 M AT E R I A L S A N D M E T H O D S

Contents5.1 Synthesis of polyamine standards . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Isolation of biosilica-associated proteins . . . . . . . . . . . . . . . . . . 105

5.3 Expression of tpSil3 from synthetic gene . . . . . . . . . . . . . . . . . . 107

5.4 HCl hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.5 AQC-derivatization of amino acids and polyamines . . . . . . . . . . . 108

5.6 LC-MS/MS analysis of QAC-derivatives . . . . . . . . . . . . . . . . . . 108

5.7 Amino acid measurement using UV-detection . . . . . . . . . . . . . . . 109

5.8 Direct infusion MS/MS analysis . . . . . . . . . . . . . . . . . . . . . . . 110

5.9 Acetylation of phosphopolyamines . . . . . . . . . . . . . . . . . . . . . 110

5.10 31P NMR measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.11 Deglycosylation with TFMS . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.12 Treatment with HF·pyridine soluble complex . . . . . . . . . . . . . . . 111

5.13 Anhydrous HF-treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.14 Protein analysis by GeLC-MS/MS . . . . . . . . . . . . . . . . . . . . . . 112

5.15 Proteomics data processing . . . . . . . . . . . . . . . . . . . . . . . . . . 114

101

102 materials and methods

Table 5.1 Chemicals and reagents. Unless otherwise noted, reagents were purchased commer-cially from Sigma-Aldrich Co. (Milford, MA, USA) with a highest purity available.

(a) chemicals and reagents

Pierce™ Amino Acid Standard H 2.5 µm/mL in 0.1 N HCl

Thermo Scientific Pierce(Waltham, MA, USA)

6 m HCl Sequencing grade

Acetonitrile (ACN) HPLC grade(freshly purchased)Water (H2O)

Pierce BSA standard ampules 2.0 mg/mL

AccQ·Tag™ Ultra Derivatization Kit Waters Corporation(Milford, MA, USA)

Trifluoroacetic acid (TFA) For protein sequencing Merck Millipore(Darmstadt, Germany)Formic acid (FA) 98–100 %

Coomassie Brilliant Blue R 250Stains for electrophoresis

SERVAHeidelberg, Germany

‘Stains All’

Sigma-Aldrich Co.(Schnelldorf, Germany)

β-casein (bovine) >98 %, SDS-PAGE

l-lysine

all >98 %

δ-hydroxy-l-lysine

ε-N-monomethyl-l-lysine

ε-N,N-dimethyl-l-lysine

ε-N,N,N-trimethyl-l-lysine

l-arginine

l-proline

Spermidine

Acetic anhydride

Triethylamine (TEA)

Ammonium fluoride (NH4F)

HF·pyridine soluble complex (Olah’s reagent) pyridine ~30 %, HF ~70 %)

GlycoProfile™ IV Chemical Deglycosylation Kit

Hydrogen fluoride (HF) Anhydrous, 3.5 GHC Gerling(Hamburg, Germany)

Trypsin MS grade Promega(Madison, WI, USA)

Endoproteinase Asp-N sequencing grade

Roche Diagnostics GmbHMannheim, Germany

Proteinase K PCR grade

Chymotrypsin Sequencing grade

Glu-C (V8 protease) MS grade

Ornithine δ-polyamine derivativeAnalytical standards Synthesized in Armin Geyer lab

(Philipps-Universität, Marburg, Germany)Lysine ε-polyamine derivative

materials and methods 103

Table 5.1 Materials and instrumentation (continued from previous page).

(b) materials

HyperSil Gold Kappa column 0.5 mm i.d.×150 mm 3 µmThermo Fisher Scientific

(Rockford, IL, USA)Acclaim™ PepMap100 C18 75 µm i.d.×15 cm, 3 µm, 100 Å

Acclaim™ PepMap100 C18 nanoViper 75 µm i.d.×2 cm, 3 µm, 100 Å

OPTI-PAK 1 µL C18Dichrom GmbH(Marl, Germany)

Acid-washed glass hydrolysis tubes 5 ml Wheaton(Millville, NJ, USA)

LoBind tubes 1.5 mL Eppendorf(Hamburg, Germany)Digital readout ThermoMixer C —

SDS-PAGE pre-cast gradient gels

Glycine (4–20 %) anamed Elektrophorese GmbH(Groß-Bieberau, Deutschland)

Glycine (4–20 %) Bio-Rad Laboratories(Richmond, CA, USA)Tricine (10–20 %)

Universal indicator paper Merck(Darmstadt, Germany)

(c) equipment and instrumentation

LTQ Orbitrap™ Velos

Mass spectrometers Thermo Fisher Scientific(Bremen, Germany)

Q Exactive™

Q Exactive™ HF

Agilent 1200 LC systemHPLCs

Agilent TechnologiesSanta Clara, CA, USA

Eksigent NanoLC™ 2D EksigentDublin, CA, USA

Vacuum concentrator RVC 2-25 CDplus Martin Christ GmbH(Osterode am Harz, Germany)

104 materials and methods

5.1 synthesis of polyamine standards

Two internal standard compounds of oligo-propylenediamine-substituted lysine and

ornithine derivatives (Fig. 5.1a and 5.1b) were synthesized by Marina Abacilar in

Armin Geyer laboratory (Philipps-Universität, Marburg, Germany). Alkylation via

Mitsunobu reaction [202] was the key step for the modification of the side chains

of amino acids (ornithine and lysine). A detailed scheme of synthesis is provided in

Fig. 5.1c. Isolated compounds were purified by RPLC and characterized by 1H- and 13C-

NMR, HPLC and HRMS. The analytical data for corresponding synthetic standards are

provided in Fig. 3.6 (Suppl. Material A.1).

NHNH

N

NH2

O

OH

(a) ornithine δ-polyamine derivative(chemical formula: C13H30N4O2;monoisotopic mass: 274.2369)

H2N

NHNH

OH

O

N

(b) lysine ε-polyamine derivative (chem-ical formula: C14H32N4O2; monoiso-topic mass: 288.2525)

NH

NH

Dde OtBu

O

n

n = 3 - Dde-(L)Orn-OtBu 1n = 4 - Dde-(L)Lys-OtBu 2

NH

N

Dde OtBu

O

n

OH

3 (n = 3)

4 (n = 4 )

i

Ns Ns

iiNH

N

Dde OtBu

O

n

NNs

NNs

iii-vi

5 (n = 3)

6 (n = 4 )

H2N

HN

OH

O

n

HN N

7 (n = 3)

8 (n = 4 )

TFA

(c) Synthesis of derivatives (a) and (b). (i) 3-brom-1-propanol, K2CO3, TBAI, 60 C, DMF, 12 h; (ii) PPh3,DIAD, N-(3-(dimethylamino)propyl)-2-nitrobenzenesulfonamide, dry THF, 3 d; (iii) 2% hydrazine inDCM; (iv) CTC-Resin, DIPEA, (v) 12 h; 2-Mercaptoethanol, DBU, 3×30 min; (vi) TFA/H2O/Et3SiH.

Figure 5.1 Chemical structures and synthesis of oligo-propylenediamine-substituted ε-lysineand δ-ornithine derivatives. The synthetized molecules are either lysine or ornithine withaddition of two aminopropyl units with a dimethylated N-terminus. (a) ornithine δ-polyaminederivative; (b) lysine ε-polyamine derivative; (c) synthesis of deprotected lysine and ornithinederivatives (a) and (b).

Purification by RPLC was performed with a Thermo Scientific Dionex UltiMate

3000 semi-preparative system, including a HPG-3200BX pump, an ERC Series-300 sol-

vent degasser, a MWD-3000 detector and a AFC-3000 fraction collector. A Macherey

Nagel VP Nucleodur C18 Gravity column (5 µm, 125×2.1 mm) was used. Eluents in

5.2 isolation of biosilica-associated proteins 105

both systems: A: H2O + 0.1 % TFA, B: MeCN + 0.085 % TFA. Afterwards, the synthe-

sized compounds were lyophilized with a Christ Alpha 2-4 LDplus.

The analytical HPLC spectra were recorded with a Thermo Scientific Dionex Ul-

tiMate 3000 system, including a LPG-3400SD pump, a WPS-3000SL autosampler, a

TCC-3000SD column compartment and a DAD-3000 detector. An ACE UltraCore 2.5

Super-C18 column (150×2.1 mm) was used as stationary phase.

High-resolution ESI mass spectra of synthesized compounds (shown in Fig. 3.6) were

recorded in the positive ion mode with a Q Exactive mass spectrometer.1H- and 13C-NMR spectroscopy (TOCSY, HSQC) was performed on a Bruker AV-

300 or AV-500/HD-500 spectrometer. Chemical shifts are reported in ppm and are

referenced to the residual solvent peak (DMSO-d6 by 2.5 ppm). Multiplicities are in-

dicated by s (singlet), d (doublet), t (triplet), bs (broad singlet), m (multiplet) and pq

(pseudo quartet). Coupling constants (J) are reported in Hertz [Hz].

5.2 isolation of biosilica-associated proteins

Thalassiosira pseudonana (strain CCMP|1335), T. oceanica (strain CCMP|1005) and Cyclotella

cryptica (strain CCMP|332) were grown in an enriched artificial seawater (ESAW) medium

according to the North East Pacific Culture Collection protocol (Canadian Center for

the Culture of Microorganisms ESAW Recipe [203]) at 18 C under constant light at

5000–10 000 lx.

Isolation of diatom cell walls was performed by Christoph Heintze in Nils Kröger

laboratory (B-CUBE, Dresden, Germany) as described previously [57]. Briefly, cells

were boiled twice in 2 % SDS / 100 mm EDTA to remove intracellular components and

membranes. Cell walls were pelleted by centrifugation (10 min, 3200×g), extracted

with acetone and washed extensively with H2O. Milli-Q (Merck, Darmstadt, Germany)

purified H2O (resistivity: 18.2 MΩ cm) was used throughout this procedure. The puri-

fied cell walls (biosilica) was lyophilized and stored at −20 C until further use.

Ammonium fluoride (NH4F) extraction of purified cell walls, orammonium fluoride

soluble material (AFSM), was as described by Kröger et al. [66]. Purified cell walls

were resuspended in 10 m ammonium fluoride and the suspension was acidified to

pH 4.5 by adding HCl. After 30 min at RT, the suspension was centrifuged (10 min,

3200×g) and the supernatant subjected to dialysis against 10 mm ammonium acetate

106 materials and methods

(Spectra/Por dialysis tubing, 500 Da molecular weight cut-off). The dialysate was

centrifuged for 15 min at 3200×g and the supernatant was lyophilized and kept at

−20 C.

For extraction of ammonium fluoride insoluble material, biosilica was isolated ac-

cording to Scheffel et al. [71] and treated with 0.1 mg/ml chitinase from Streptomyces

griseus ( 0.2 U/mg) from Sigma-Aldrich Co. (Schnelldorf, Germany) in chitinase

buffer1 (50 mm potassium phosphate pH 6.0, 0.05 % (w/v) sodium azide, 1 mm PMSF)

at 37 C in a shaker incubator for 2 d. The progress of chitin degradation was moni-

tored by Calcofluor White staining as described previously [163]. The chitinase-treated

biosilica was washed once with 1 % (w/v) SDS followed by 5× washing with H2O

by repeated centrifugation-resuspension cycles. The final pellet (i. e., chitin-free biosil-

ica) was resuspended in H2O and freeze-dried. The dry material was resuspended in

150 ml 10 m NH4F and adjusted to pH 4.5 by the addition of HCl. The suspension was

incubated at room temperature for 30 min, and centrifuged at 3200×g for 30 min. The

pellet was washed with H2O by resuspension-centrifugation (3200×g, 30 min). Resid-

ual chitin was removed by a second chitinase treatment as described above, followed

by washing once with 1 % (w/v) SDS and 5× washing with H2O. The resulting NH4F-

insoluble organic matrix material (AFIM) was freeze-dried and stored at −20 C until

use.

The isolation of silaffin-3 from T. pseudonana (tpSil3), which was used in Sections 3.1.3

and 3.3.3, was according to Poulsen and Kröger [67]. The dialysate of AFSM extract

was loaded onto a HighS cation exchange column (Bio-Rad Laboratories, Richmond,

CA, USA) equilibrated in 50 mm ammonium acetate. After washing the column with

50 mm ammonium acetate and 0.5 m ammonia, LCPAs were eluted with 2 m NaCl in

pH 10.0 buffer (100 mm ammonia, 50 mm ammonium acetate). Next, the flow-through

from the HighS cation exchange column and the 50 mm ammonium acetate wash were

pooled, concentrated by lyophilization, and then subjected to fractionation using a

Superdex200 HiLoad 16/60 column (Amersham Biosciences, Little Chalfont, UK)

with running buffer 500 mm NaCl and 50 mm ammonium acetate at 1.0 ml/min flow

rate. Fractions were analyzed by Tricine-SDS-PAGE [204] and staining with ‘Stains

All’ [86, 87]. Fractions eluting between 45 and 60 min (containing tpSil1/2 and tpSil3)

were combined, concentrated by ultrafiltration (molecular weight cut-off 10 kDa) and

1 Note: the chitinase solution was filtered through a polyethersulfone syringe filter with 0.2 µm pore size.

5.3 expression of tpsil3 from synthetic gene 107

loaded onto a Mono Q HR-5/5 column (Amersham Biosciences, Little Chalfont, UK)

equilibrated with 50 mm Tris-HCl, pH 6.4. Elution was performed at a flow rate of

0.5 ml/min by linearly increasing the NaCl concentration to 2 m in 1 h. Fractions con-

taining tpSil3 eluted between 22.5 and 28.5 min were pooled, exhaustively dialyzed

(molecular weight cut-off 7 kDa) against 10 mm ammonium acetate, and lyophilized.

The dry residue was dissolved in water and stored frozen at −20 C until use.

5.3 expression of tpsil3 from synthetic gene

tpSil3 was expressed from a synthetic gene according to Kumar et al. [171]. The

database sequence of tpSil3 (without signal peptide) was in-silico concatenated into

a single protein sequence, flanked with tryptic peptide sequences picked out from the

PhospB and (at the N-terminus side) and BSA (at the C-terminus side) and with two

affinity tags (Twin-strep-tag and His-tag) together with 3C cleavage site (sequence is

depicted in Fig. 5.2).

MGSAWSHPQFEKGGGSGGGSGGSAWSHPQFEKLEVLFQGPAAAKVFADYEEYVKDFYELEPHKVAAAFPGDVDRGLAGVENV

TELKEGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMPDEAGAKSAKMSMDTKSGKSEDAAAVD

AKASKESHMSISGDMSMAKSHKAEAEDVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKESSMP

SSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSAHLVDEPQNLIKYLYEIARQTALVELLKLGEYGFQNAL

IVRDAFLGSFLYEYSRLVNELTEFAKGSGHHHHHH

Figure 5.2 Sequence design of tpSil3 expressed from a synthetic gene. The sequence of tpSil3(without signal peptide) is in-silico concatenated into a single sequence, flanked with peptidesequences from PhosB (at the N-terminus side) and bovine serum albumin (BSA) (at the C-

terminus side) proteins with two affinity tags ( Twin-strep-tag and His-tag ) together with 3Ccleavage site.

This synthetic gene was obtained from GeneArt (Regensburg, Germany) and sub-

cloned into pET expression vector, which was transformed into an E. coli strain that

was dual auxotroph for arginine and lysine ( 4Arg4Lys BL21 (DE3) T1 pRARE ). Fresh

transformants were inoculated into a synthetic media MDAG-135 [205] supplemented

with antibiotics (100 µg/ml ampicillin and 15 µg/ml chloramphenicol) and incubated

overnight at 37 C on a shaking platform. After this overnight incubation the start-

ing culture was further diluted 1000× by MDAG-135 media supplemented with the

same antibiotics and incubated at 37 C until OD600=0.5. Cells in MDAG-135 media

were induced by 0.2 mm IPTG. After 4 h to 6 h post induction cells were pelleted, re-

108 materials and methods

suspended in 2× PBS, aliquoted, snap frozen in liquid nitrogen and stored at −80 C

until analysed. Prior analyses frozen aliquots were thawed and cell lysed in an equal

volume of 2× Laemmli buffer by incubating at 80 C for 15 min. The samples were

spun down and the supernatant subjected to SDS-PAGE. Protein bands were visual-

ized by Coomassie staining and full length expression of corresponding synthetic genes

was validated by in-gel digestion with trypsin or Asp-N and LC-MS/MS of recovered

peptides, as described in Section 5.14.

5.4 hcl hydrolysis

Hydrolysis of biosilica extracts (AFSM and AFIM) and protein tpSil3 was performed

by 6 m HCl in vacuo for 17 h at 110 C in acid-washed clear-glass tubes. The HCl hy-

drolysates were evaporated to dryness in a vacuum centrifuge at 40 C and dissolved

in water for further analysis.

5.5 aqc-derivatization of amino acids and polyamines

Derivatization of acidic hydrolysates and standards with AccQ·Tag Ultra Kit was car-

ried out according to the protocol provided by the manufacturer [206]. 20 µL of either

standard solution or biosilica hydrolysate was mixed with 140 µL AccQ·Tag Ultra bo-

rate buffer (pH~9.0), and 40 µL AccQ·Tag Ultra reagent that was previously dissolved

in 1.0 mL of AccQ·Tag Ultra reagent diluent (the concentration of reconstituted AQC

is approximately 10 mm in acetonitrile). The solution was left for 1 min at RT, but

to ensure complete derivatization and decomposition of unreacted reagent it was al-

lowed to proceed for 10 min at 55 C. AQC-derivatization mixture was diluted 10×and immediately subjected to LC-MS/MS analysis.

5.6 lc-ms/ms analysis of qac-derivatives

Liquid chromatographic analysis of QAC-derivatives was performed on a Agilent 1200

LC system, equipped with a binary solvent manager, an autosampler, a column heater,

a DAD (set at at 280 nm), and interfaced to a Q Exactive mass spectrometer. The

separation column was a HyperSil Gold Kappa column 0.5 mm i.d.×150 mm packed

with 3 µm particles. A column OPTI-PAK (1 µL C18) was used as the trap column.

5.7 amino acid measurement using uv-detection 109

10 µL of the sample was loaded with solvent A at 0.020 µL/min. After loading the

trap column was switched online to the separation column, and the mobile phase flow

rate was maintained at 20 µL/min. 10 µL of sample was injected. Eluent A was 0.1 %

FA in water, and eluent B was FA in neat acetonitrile. The column heater was set at

40 C. The separation of QAC-derivatives was performed by 60 min gradient, which is

provided in Table 5.2.

Table 5.2 HPLC gradient used for the analysis of QAC-derivatives.

Time (min) 0 10 20 50 55 57 60B (%) 0 0 10 95 95 0 0

A: 0.1 % FA B: 100 % ACN in 0.1 % FA

The LC was connected to the Q Exactive mass spectrometer under the control of Xcal-

ibur 4.0 software (Thermo Fisher Scientific). Survey scans were acquired within the

range of m/z 140–700 at a resolution of of 70 000 FWHM at m/z 200 and with the target

value of 3× 106 ions with a maxiaml injection time of 100 ms. Survey scan was fol-

lowed by MS/MS fragmentation targeted to the inclusion list derived from Tables 3.1

and A.1. Isolation of precursors was performed with a window of m/z = 2 at 5 ppm.

Spectra were acquired in one microscan under the stepped normalized collision energy

of 25, 30 and 35 % with a target value of 1× 105 ions and a maxiaml injection time of

100 ms. Resolution for HCD spectra was set to 70 000 at m/z 200 with, whereas the

first mass was fixed to m/z 80. Three replicate LC-MS/MS runs for each sample were

performed and saved as .RAW files (Thermo).

5.7 amino acid measurement using uv-detection

The amino acid analysis (AAA) with single wavelength UV detection was done in the

Functional Genomics Center Zürich (FGCZ, ETH Zürich, Switzerland) [207]. The

tpSil3 sample was hydrolyzed with HCl and derivatized by AQC reagent as described

previously. Derivatives were separated with high resolution using UPLC columns

(1.7 µm particles). Amino acid concentrations were determined using the MassTrak

Amino Acid Analysis Solution (Waters Corporation, Milford, USA) with UV de-

tection at 280 nm. The results were distributed as .pdf and .txt file, containing the

chromatogram and a tabular summary of the integration results.

110 materials and methods

5.8 direct infusion ms/ms analysis

Total biosilica hydrolysates and analytical standards were diluted with a mixture aceto-

nitrile/water/FA (v/v/v - 50/45/5). Dilution of the hydrolysates and stock solutions

were selected individually for each experiment. Prior to the analysis, samples were

loaded into 96-well plate (Eppendorf, Hamburg, Germany). Mass spectrometric anal-

ysis was carried out in the positive ion mode using either Q Exactive or Q Exactive HF

mass spectrometer. Instruments were equipped with the robotic nanoflow ion source

TriVersa NanoMate (Advion BioSciences, Ithaca, NY, USA) using chips with spray-

ing nozzles with a diameter of 4.1 µm and controlled by Chipsoft 8.3.3 software. The

ionization voltage and gas back pressure were set to 2.00 kV and 0.80 psi. Under these

settings, 8 µL of the analyte was electrosprayed for more than 50 min. The temperatures

of the ion transfer capillary was 275 C and S-lens level was 65 %. The samples were

sprayed for 10 min. FT MS mass resolution Rm/z 400 was 140 000 (FWHM); target value

AGC was 3× 106 and maximum injection time was 25 ms. One FT MS was acquired

within 3.52 s. The total acquisition time for all FT MS spectra was 10 min.

5.9 acetylation of phosphopolyamines

The hydrolyzed biosilica extract from T. pseudonana was derivatized with acetic anhy-

dride. Briefly, biosilica hydrolysate was evaporated to dryness and dissolved in 80 µl

of triethylamine 30 mm resulting in a 200× dilution corresponding to the initial extract

volume. The mixture was derivatized by addition of 160 µl of acetic anhydride/iso-

propanol = 1/7 (v/v) (total volume can be changed if proportions are maintained).

Above solutions were mixed thoroughly and incubated for at least 2 h at RT. The

derivatization reagents were removed by vacuum centrifugation at 40 C. The dried

sample was reconstituted in 50 µl of 50 % acetonitrile containing 5 % FA for direct infu-

sion MS/MS analysis.

5.10 31p nmr measurements

31P-NMR was performed by Marcus Rauche in Eike Brunner laboratory (Technische

Universität Dresden, Germany). All experiments were carried out at 300 K using a

5.11 deglycosylation with tfms 111

Ascend™ 500 spectrometer from from Bruker Daltonik GmbH (Bremen, Germany).31P-NMR spectra were recorded at a resonance frequency of 202.45 MHz using a 5 mm

BBO prodigy cryo probe (cooled with nitrogen to increase the sensitivity). A pulse

length of 11.63 µs at 61 W and a relaxation delay of 2 s were used. For 31P1H-NMR-

decoupling WALTZ-16 was used. The samples were rotated with 20 Hz. The chemical

shifts were referenced relative to H3PO4 for 31P-NMR. The samples were dissolved in

600 µl of 99.9 % D2O (Sigma-Aldrich Co., Schnelldorf, Germany), after centrifugation

the supernatant was adjusted with 0.1 m HCl to pH~5.0 (tested by universal indicator

paper) and transferred to a 5 mm NMR-tube.

5.11 deglycosylation with tfms

Deglycosylation with GlycoProfile IV Kit was performed according to the manufac-

turer’s instructions [208]. A dry pellet of tpSil3 was dissolved in 150 µl of trifluoro-

methanesulfonic acid (TFMS) and incubated for 30 min or 2 h at 0 C. Subsequently,

the solution was neutralized by dropwise addition of 60 % pyridine (in ethanol), which

caused the formation of a fine precipitate after 150 µl of the pyridine solution was

added. The precipitate was completely dissolved by adding 20 µl of water, and neu-

tralization was quickly completed by adding 150 µl of the pyridine solution2. The neu-

tralization of reaction mixture was monitored by addition of 4 µl Bromophenol Blue

Solution as an indicator dye until the pH is ~6.0. The neutral solution was mixed with

a sample buffer for SDS-PAGE and subjected to GeLC-MS/MS.

5.12 treatment with hf ·pyridine soluble complex

A chemical dephosphorylation/deglycosylation with anhydrous HF·pyridine soluble

complex was performed as described previously [174]. A dry pellet of tpSil3 was

dissolved in 50 µl of anhydrous HF·pyridine soluble complex and incubated at 0 C for

30 min, 1, 2 and 3 h. The reaction mixture at any given time was neutralized with a

sample buffer for SDS-PAGE and subjected to GeLC-MS/MS.

2 Note: the entire process of neutralization should be carried out quickly, keeping the reaction mixturecold at all stages to minimize protein degradation

112 materials and methods

5.13 anhydrous hf-treatment

Biosilica extracts or tpSil3 were dried and dissolved in liquid HF [58, 59]. After 30 min

at 0 C, HF was evaporated, and any remaining material was dissolved in water. De-

phosphorylation/deglycosylation efficiency is demonstrated by shifts to lower mass

in HF-treated proteins, evaluated by SDS-PAGE. After HF-treatment samples were

analyzed by GeLC-MS/MS analysis.

5.14 protein analysis by gelc-ms/ms

In-gel digestion was performed according to Shevchenko et al. [110, 111]. To visualize

protein lanes, gels were fixed, rinsed with water and successively stained with ‘Stains

All’ [86, 87] and Coomassie Brilliant Blue R 250. After destaining the entire gel lanes

covering the mass range of 10–250 kDa were excised, cut into small pieces (~1 mm3).

The gel pieces were then transferred into 1.5 mL tubes and completely destained by

acetonitrile/water. Destained gel pieces were reduced with 10 mm DTT (for 45 min at

56 C) and alkylated with 55 mm IAA (for 30 min in dark at RT). The reduced and

alkylated gel pieces were washed with water/acetonitrile and then shrunk with aceto-

nitrile. The ice-cold protease solution (per−mode = symbol50 ng µl−1) was added

to cover the shrunk gel pieces, and incubated on ice for 2 h. The swollen gel pieces

was then covered with 10 mm NH4HCO3 and incubated at 37 C for 12–18 h. Cleavage

specificity for proteases3 used or discussed in this thesis is provided in Table 5.3. The

resulting peptides were extracted by water/acetonitrile/FA (v/v/v - 49/50/1), dried

in a vacuum centrifuge and stored at −20 C until use.

The resulting peptides were recovered in 5 % and 2.6 µL was injected using AS-2 au-

tosampler into direct pumping nanoflow liquid chromatography system (Eksigent

NanoLC 2D), which eliminates the limitations imposed by flow splitting. NanoLC

was equipped with a 300 µm i.d.×5 mm trap and 75 µm i.d.×15 cm separation column

packed with Acclaim PepMap100 C18 3 µm particles. Multiple lysine PTMs increase

hydrophilicity of modified peptides, aggravating RPLC separation. Therefore, LC

3 Note: non-specific proteases, when allowed to work for a long time, can result in large number of shortpeptides, decreasing reproducibility and complicating the further analysis. Therefore, for non-specificproteases (e. g., Proteinase K) shorter incubation times were used (~4–6 h).

5.14 protein analysis by gelc-ms/ms 113

Table 5.3 Cleavage specificity of the proteases used (or discussed) in this thesis. For a review,see [109, 180, 199]. ‘[]’, cleavage activators; ‘〈〉’, cleavage preventors; ‘.’, cleavage point; ‘Ψ’,aliphatic, aromatic, or hydrophobic amino acids.

Protease Optimal pH Cleavage specificity used for Mascot searches

Trypsin 7.5 [RK].〈P〉 C-terminal to an arginine ora lysine (if not followed by a proline)

Glu-C (V8) 8.0 [E].〈P〉 and slower at [D].〈P〉 C-terminal to an glutamic acid and slower toan aspartic acid (if not followed by a proline)

Asp-N 4.0–9.0 .[D] and less specific at .[E] N-terminal to an aspartic and aglutamic acid (with less specificity)

Chymotrypsin 7.8–8.0 [FWY].〈P〉[LMADE].〈P〉 at slower rate Semi-specific

Proteinase K 8.0 Ψ. Non-specific

OmpT [197] 6.0–6.5 [KR].[KR] Cleaves within dibasic combinations of Arg and Lys.

LisargiNase [198] 7.5 .[RK] N-terminal to an arginine or a lysine

gradients were adjusted as follows: solvent A was 0.1 % FA in water; solvent B was

60 % acetonitrile in water containing 0.1 % FA. The samples were loaded for 8.5 min

with solvent A at 2 µL/min. After loading the trap column was switched online to the

separation column, and flow rate was set to 300 nL/min. The peptides were fraction-

ated using 175 min program shown in Table 5.4.

Table 5.4 HPLC gradient used for the analysis of peptides.

Time (min) 0 25 145 150 155 175B (%) 0 0 55 55 0 0

A: 0.1 % FA B: 60 % ACN in 0.1 % FA

The nanoLC was connected to the LTQ Orbitrap Velos hybrid mass spectrometer un-

der the control of Xcalibur 4.0 software (Thermo Fisher Scientific). The DDA cycle

consisted of a survey scan acquired in µs within the range of m/z 350–1600 performed

under the target mass resolution of 60 000 FWHM in the Orbitrap amalyzer. Automated

gain control (AGC) target ion count was set to 1× 106 for FT MS scans with maximal

fill time of 500 ms; precursor ion isolation width of 3 Da; spectra were recorded in cen-

troid mode. The data-dependent acquisition (DDA) cycle consisted of acquiring FT MS

survey spectrum followed by 8 MS/MS spectra with a fragmentation threshold of 4000

ion counts; singly charged precursor ions were excluded. Four CID and four HCD

fragment spectra were acquired in one microscan under the normalized collision en-

ergy (nCE) of 35 % and target value of 1× 104 ions (ion selection threshold 400 counts;

precursor ions isolation width m/z = 3). Activation parameter q = 0.25 and activation

114 materials and methods

time of 30 ms were applied. Fragmented precursors were dynamically excluded for

30 s. Two replicate LC-MS/MS runs for each sample were performed and saved as

.RAW files (Thermo). Lock mass was set to the singly charged ion of dodecamethylcy-

clohexasiloxane ion ((Si(CH3)2O))6; m/z =445.120025).

5.15 proteomics data processing

Data processing was performed using Proteome Discoverer 2.1 (Thermo Fisher Scien-

tific, Bremen, Germany). Beforehand, all MS2 spectra were processed using custom-

built deconvolution node developed by Vladimir Gorshkov [183] to produce deiso-

toped MS/MS spectra consisting only of singly charged fragments. Briefly, each iso-

topic cluster is converted to one singly charged peak with m/z-value calculated accord-

ing to the formula (*), where m/z1+ represents the mass-to-charge ratio of the deconvo-

luted singly charged peak, z is the charge state of the fragment and mH+ is the mass of

the proton:

m/z1+ = m/zz+ × z− (z− 1)×mH+ (*)

Next, fragment masses were grouped with a Distance tolerance that was set to

5 ppm4. For the grouped peaks masses were averaged and intensities were summed

up. All multiply charged peaks were deconvoluted to the singly charged state, and

all peaks that could not be assigned to any charge state according to the isotopic pat-

tern were transferred to the deconvoluted spectra with charge state 1+. The parameter

Isotope peak difference / N was set to one.

Mascot 2.2.06 database search engine (Matrix Science, London, UK) [209] was

used for peptide identifications against the custom-made database (80 096 sequence

entries) containing protein sequences from three diatom species (T. pseudonana [13],

T. oceanica [14] and C. cryptica [15]), which was concatenated with sequences of com-

mon laboratory contaminants were added (proteases, keratines, etc.) Deconvoluted

fragment spectra were sorted to CID and HCD spectra, which were processed indepen-

dently with parameters listed in Table 5.5.

4 i. e., a moving window is used to check each two neighbouring masses; m/z having absolute differencesless than 5 ppm are considered to belong to the same m/z peak.

5.15 proteomics data processing 115

Table 5.5 Mascot search parameters.

Parameters CID MS/MS HCD MS/MS

Precursor tolerance 10 ppm

Fragment tolerance 0.6 Da 60 mmu

Max missed cleavages 3

Enzyme See Table 5.3

Fixed modifications Carbamidomethyl (C)

Variable modifications Methionine oxidation and 2 ε-polyamine PTMs (see Section 3.3.3)

Precursor type (mass) Monoisotopic

Peptide charge 2+ and 3+

Instrument ESI-TRAP instrument settings HCD instrument settings

1+ fragments yes

2+ fragments if precursor 2+ or higher yes

immonium ions no yes

a-series no yes

b-series yes

y-series yes

y or y++ must be significant no

Max mass for internal fragements 700 1500

Scaffold 4.8.7 (Proteome Software, Inc., Portland, OR, USA) was used to validate

peptide-spectrum matches (PSMs) and protein identifications from Mascot searches [210,

211]. Proteins containing shared peptides were grouped satisfying the laws of parsi-

mony. Peptide identifications were accepted if they could be established at greater than

99.0 % probability as specified by the Peptide Prophet algorithm [210]. The peptide and

protein identities were accepted if they displayed a false discovery rate (FDR) ≤ 2 %

based on the Scaffold Local FDR algorithm with at least two unique peptides with a

precursor ion mass accuracy ≤ 10 ppm. Protein identifications were accepted if they

could be established at greater than 95.0 % probability [211] detected in at least one

biological replicate. Proteomics data were deposited to the ProteomeXchange Datasets

Consortium via the PRIDE [212] partner repository.

Sequece Logos [196] for the identification of PTM consensus sites were produced

by TEXshade package [213]. Briefly, amino acid sequences were restricted to 20 amino

acids (10 downstream and 10 upstream) and aligned by lysine PTM sites; details are

described in the Results and Discussion (Section 3.3.6).

A A P P E N D I X

117

118 supplemental material

NHS

+

6-quinolinyl carbamic acid AMQ

CO2+

OH-

NHS

+

6-quinolinyl isocyanate

+AMQ

+H 2O

+H 2O

NHS

++AMQ

AQC

N,N’-bis(6-quinolinyl)urea

(a)

(c)

(f)

(d)

(e)

(b)

Figure A.1 Reactions of 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC), whichmight occur in buffered aqueous solutions and/or during storage: (a) hydrolysis of AQC (cat-alyzed by acids or bases) results in N-hydroxysuccinimide (NHS) and 6-quinolinyl carbamicacid, which is an unstable and spontaneously breaks down (b) to carbon dioxide and 6-amino-quinoline (AMQ); (c) alkaline hydrolysis of AQC eliminates NHS and gives 6-quinolinyl iso-cyanate; (d) acid or base catalyzed addition of water to the carbon-nitrogen double bond givesan N-substituted carbamic acid; (e) in absence of a basic catalyst, disubstituted urea, N,N′-bis-(6-quinolinyl)urea, can be obtained by a nucleophilic addition of AMQ; (f) the primary amineAMQ forms N,N′-bis(6-quinolinyl)urea by a nucleophilic substitution reaction [129].

supplemental material 119

Table A.1 Calculated N×QAC-derivatization groups for lysine ε-polyamine modificationsfrom Fig. 3.1 (p. 39). Propylamine units (PA0, PA1, PA2); N-methyl groups (Me1–Me7); δ-hydroxylation of lysine (Hydroxy); phosphorylation of side hydroxyl (Phospho).

Backbone Me0 Me1 Me2 Me3 Me4 Me5 Me6 Me7 ...

Lys-PA0 2 2 — 1 — 1 — — — — — — — ...

Lys-PA1 3 3 2 2 — 2 1 1 — 1 — — — ...

Lys-PA1-PA2 4 4 3 3 2 3 2 2 1 2 1 1 1 ...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Hydroxy-Lys-PA0 2 2 — 1 — 1 — — — — — — — ...

Hydroxy-Lys-PA1 3 3 2 2 — 2 1 1 — 1 — — — ...

Hydroxy-Lys-PA1-PA2 4 4 3 3 2 3 2 2 1 2 1 1 1 ...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Phospho-Hydroxy-Lys-PA0 2 2 — 1 — 1 — — — — — — — ...

Phospho-Hydroxy-Lys-PA1 3 3 2 2 — 2 1 1 — 1 — — — ...

Phospho-Hydroxy-Lys-PA1-PA2 4 4 3 3 2 3 2 2 1 2 1 1 1 ...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

0.001 0.01 0.1 1 10 100

Arg+QAC1

His+QAC1

Ser+QAC1

Gly+QAC1

Asp+QAC1

Glu+QAC1

Thr+QAC1

Ala+QAC1

Pro+QAC1

Lys+QAC2

Val+QAC1

Leu+QAC1

Ile+QAC1

Phe+QAC1

Tyr+QAC1

Met+QAC1

Lys+QAC2

amount loaded on-column, log-scale, pmol

inst

rum

en

t re

spo

nse

, lo

g-s

cale

, a

.u.

Figure A.2 Calibration curves of QAC-derivatized amino acids. The dynamic range and linear-ity for QAC-derivatives were obtained from calibration curves that were built using solutionsof a standard physiological amino acid mix (Pierce Amino Acid Standard H, Table 5.1). Y -arbitrary abundance units, X – amount loaded on-column, pmol; both logarithmic scales.

120 supplemental material

0.0

3.6

44.2

10.19.5

20.4

1.5

39.0

6.15.4

2.3

4.8

2.6

0.0

6.07.6

16.3

1.0

4.0

13.7

2.10.8

1.60.8

1.8

7

40

12 12

20

4

34

67

4 43

18

33

0

10

20

30

40

50

60

Arg His Ser Gly Asx Glx Thr Ala Pro Val Leu Ile Phe Tyr Met Lys175 261 275 289 303 319 333 399 413

Nu

mb

er o

f am

ino

aci

d r

esid

ues

Number of AA residues

Experimental Theoretical

Lysine PTMsAmino Acids

(b)(a) (c) (d) (e) (f) (g) (h) (i)

mapped modifications

amino acid analysis (AAA) and lysine PTMs profile of silaffin-3 from T. pseudonana (tpSil3) bymass spectrometry (MS)- and ultraviolet (UV)-detection.

+

+

(a) PTM 175 (3×QAC)

+

+ +

(b) PTM 261 (3×QAC)

+ + +

(c) PTM 275 (4×QAC)

+ +

(d) PTM 289 (3×QAC)

+ +

(e) PTM 303a (2×QAC)

+ +

(f ) PTM 319 (3×QAC)

+ +

(g) PTM 333 (2×QAC)

+ +

(h) PTM 399 (3×QAC)

+ +

(i) PTM 413 (2×QAC)

Figure A.3 Number of amino acid residues. Comparison of obtained amino acid content ofsilaffin-3 from T. pseudonana (tpSil3) with the theoretical one (cf. with Fig. 3.4). Numbers ontop of each bar represent calculated amino acid residues. Only 25 % of unmodified lysineswere detected, whereas ~75 % of total lysine content is modified with different ε-modifications,displayed in (a)–(i). Asx, Aspartic acid or Asparagine; Glx, Glutamic acid or Glutamine; QAC,derivatization group.

A.1 analytical data for synthetic standards 121

a.1 analytical data for synthetic standards

(a) (S)-2-amino-5-((3-((3-(dimethylamino)propyl)amino)propyl)amino)pentanoic acid

1H-NMR (500MHz, DMSO d6): δ (ppm) = 1.61-1.89 (m, 4 H, γ H, β H2), 1.89-

2.03 (m, 4 H, HNCH2CH2CH2NH), 2.79 (d, 1 J = 3.2 Hz, 6 H, N(CH3)2), 2.89-3.05 (m,

8 H, δ H2, CH2NH), 3.06-3.17 (m, 2 H, CH2N(CH3)2), 3.95 (m, 1 H, α H), 8.31 (bs, 3 H,

α NH +3 ), 8.68 (bs, 2 H, 2-NH +

2 ), 8.79 (bs, 2 H, 2-NH +2 ), 9.78 (bs, 1 H, 3-HN+(CH3)2)

(300 K).13C-NMR (75MHz, DMSO d6): δ (ppm) = 21.2 (HNCH2CH2CH2NH), 21.9 (C γ),

22.8 (HNCH2CH2CH2NH), 27.4 (C β), 42.6 (N(CH3)2), 44.4 (CH2NHCH2), 46.7 (C δ),

51.8 (C α), 54.1 (CH2N(CH3)2) (300 K).

(b) (S)-2-amino-6-((3-((3-(dimethylamino)propyl)amino)propyl)amino)hexanoic acid

1H-NMR (500MHz, DMSO d6): δ (ppm) = 1.32-1.42 (m, 1 H, γ H), 1.42-1.52 (m,

1 H, γ H), 1.55-1.66 (m, 2 H, δ H2), 1.68-1.87 (m, 2 H, β H2), 1.87-2.04 (m, 4 H,

HNCH2CH2CH2NH), 2.80 (d, 1 J = 4.4 Hz, 6 H, N(CH3)2), 2.84-2.93 (m, 2 H, ε H2),

2.93-3.048 (m, 6 H, CH2NH), 3.06-3.18 (m, 2 H, CH2N(CH3)2), 3.98 (pq, 3 J = 5.7 Hz,

1 H, α H), 8.26 (bs, 3 H, α NH +3 ), 8.65 (bs, 2 H, 2-NH +

2 ), 8.81 (bs, 2 H, 2-NH +2 ), 9.78

(bs, 1 H, 3-HN+(CH3)2) (300 K).13C-NMR (75MHz, DMSO d6): δ (ppm) = 21.8 (C γ), 21.2, 22.9 (HNCH2CH2CH2NH),

25.5 (C δ), 29.9 (C β), 42.6 (N(CH3)2), 44.3 (CH2NHCH2), 46.8 (C ε), 54.0 (CH2N(CH3)2),

52.2 (C α) (300 K).

Chemical shifts are reported in ppm and are referenced to the residualsolvent peak (DMSO-d6 by 2.5 ppm). Multiplicities are indicated by s(singlet), d (doublet), t (triplet), bs (broad singlet), m (multiplet) and pq(pseudo quartet). Coupling constants (J) are reported in Hertz [Hz].

122 supplemental material

(a) Q9SE35 Silaffin-1 from C. fusiformis (cfSil1)MKLTAIFPLLFTAVGYCAAQSIADLAAANLSTEDSKSAQLISADSSDDASDSSVESVDAASSDVSGSSVESVDVSGSSLESVDVSGSSLESVDDSSEDSEEEELRILSSKKSGSYYSYGTKKSGSYSGYSTKKSASRRILSSKKSGSYSGYSTKKSGSRRILSSKKSGSYSGSKGSKRRILSSKKSGSYSGSKGSKRRNLSSKKSGSYSGSKGSKRRILSSKKSGSYSGSKGSKRRNLSSKKSGSYSGSKGSKRRILSGGLRGSM

(b) Q5Y2C2 Silaffin-1 from T. pseudonana (tpSil1)MKVTTSIITLLFASCGAADVQRVLEDVTEPAVTTPAATPAPITPEPATPAPTICEGRNFYYDEETRKCSNEATGGIYGTLIDCCVAISGSVSCPYVDICNTLQPSPSPETNEPSAKPITAAPISSAPVSAAPVTSAPVAAPVETTSMTGPTTIVASIVSTNAPSLTNAPSSSLEAVVTRIPVETTNTASPTTTAASIVSTNAPSSSPEAVVTPRPTFRPSPEGTESNTSPASIASDVMFGPPKTSTPTSTPTSSSHPSSSEPTLSPSVSKEPTGYPTSSPSHSPTKSPSKSPSSSPTTSPSASPTETPTETPTESPTESPTESPTLSPTESPTLSPTESPSLSPTLSTTWSPTGYPTLAPSPSPSISSAPSVSSAPSSPPSISSAPSVSSAPSKNFGFLPGLTEMPTISPTEDHYFFGKSHKSHKSHKSKATKTLKVSKSGKSAKSSKSSGRRPLFGVSQLSEGIAVGYAKSSGRSSQQAVGSWMPVAAACILGALSFFLN

(c) Q5Y2C1 Silaffin-2 from T. pseudonana (tpSil2)MKVTTSIITLLFASCGAADVQRVLEDVTEPAVTTPAATPAPITPEPATPAPTICEGRNFYRDDDTGKCSNEATGGIYGTLIECCVAISGSDSCPYVDICNTLQPSPSPETNEPSAKPITAAPISSAPVSAAPVTSAPVAAPVETTSMTGPTTIVASIVSTNAPSSTNAPSSSLEAVVTRIPVETTNTASPTTTAASIVSTNAPSSSPEAVVTPRPTFRPSPKGTESNTFPASIASDVMFDPARSEPTFTPTSSSQPSSSEPTLSPSVSKEPTRYPTSSPSHSPTKSPSKSPSSSPTTSPSASPTETPTETPTESPTELPTLSPTEFPSLSPTLSPTWSPTGYPTLAPSPSPSISSAPSVSSAPSSSPSISSAPSVSSAPSKNFGFLPGRNEMPTISPTEDHYFFGKSHKSHKSKATKTLKVSKSGKSSKSSKSSGSRPLFGVSQLSEGIAAGYAKSSGRSSQQAVGSWMPVAAACILGALSFFLN

(d) B8BRK6 silaffin-3 from T. pseudonana (tpSil3)MKTSAIALLAVLATTAATEPRRLRTLEGHGGDHSISMSMHSSKAEKQAIEAAVEEDVAGPAKAAKLFKPKASKAGSMPDEAGAKSAKMSMDTKSGKSEDAAAVDAKASKESHMSISGDMSMAKSHKAEAEDVTEMSMAKAGKDEASTEDMCMPFAKSDKEMSVKSKQGKTEMSVADAKASKESSMPSSKAAKIFKGKSGKSGSLSMLKSEKASSAHSLSMPKAEKVHSMSA

(e) B8C0W5 silaffin-4 from T. pseudonana (tpSil4)MKIIFPALAIIALVNGQQQVHRLRNDVIEHRVSSSASVATSTLFGRKGGRELSADRSEGSGGSGDEEAVDAKAEKTSTTGSAKAGKSAENEAATETSSKAAKLFKPKSSKGGASDASTEYESGASDASTEYESGASEAGAEVTAKAEKGSDDEGHDAKADKGTGSGKSGKSMSMHAKSGKGEAGSDMSVSSKAQMSYIHGSGDEGSDEATTSDASKATKVFKSSGKSGKGEAAGSSDMSVSSKPEKSEGSSEATTADASKATKVYKSDASTESKSAKHSASMPFGKSSKESDAKAHKGEMSVHQSKAFKGKSSKAMSVSSKAMSVSSKAASMSHYTHGYEKSIFG

(f ) B8C322 CingulinW1 from T. pseudonana (tpCinW1)MKIGYSLALLAVASASAQNTGLRGSDAEVELFNRKLSDWGDDGWNDDGWNDDGWGGSGSSSKSSKSGSSGSSGKSGKSGSSGKSGKSGSGDSWSDDGWSGSSGKSGKGDYGGSSGKSGKGGYGGHWVWEGSDDSTSWGSDDSYSSGKSGKGSKGSSKSSKGSGKSSKGSGKSSKGSDSSDDGEWGSGGWGSGGWGGGSSGKSGKGSYGGWASSDDGSWGGGSSGKSGKGSYGSSGKSGKGSYGGWAPSDDGWDGDGWYGGDSSGKSGKGSSGGSGKSGKGSYDGGWGSDDGTSWGSDDSYSSGKSGKGSKGSSKSSKGSSSKSSKGSSSKSGKGSGKSSKGSSDSSSSWDGHGGWSDSWGGDYSGKSGKGSSGKSGKGSSGGSWGSGSSGKSGKGSSGGSGDSSYGGWDGDSYREYGGF

(g) B8CDQ9 CingulinW2 from T. pseudonana (tpCinW2)MKLALFLTIPTLIAAQQSSVRGVATTSSRQLDEWGDDAWGSSDSGSSGKSGKSGGSASSGDGWETDGWGGDYSSSKSGKSGSGKSGKGSSGPHGHWVYIEDDSSDGSGKSGKGSSSKGSKGSSKSSKGSSSDDSTDDSWDGGWGGHGGWNGDNSGKSGKGSYGSGKSGKGSSYPSSHWGPSHWGSDDDDSSSSKSSKGSSESSSKSSKGSSDSSSKSSKGSSSSEDEGHWEWEGGYGSGKSGKGSYSGSSGKSGKSGSGDSWVGDYGSSGKSGKGSYGGDSWGGNYNGWGGHYDVDVDDDDSSSSKSSKGSSKSSKGSSEDSSKSSKGSSSKSSKGSSSSEDEGHWVWEGSYGSGKSGKGSYSGSSGKSGKSGSGDEGWYSGW

(h) B8CEX1 CingulinW3 from T. pseudonana (tpCinW3)MKAALILALAAGASAEITDQFERELGKSGKGSYGDWGGNYNGWGGNYWGDSSSDSSSKSSKGSSKSGKGSSKSGKGSSKSSKGSSKSSKGSSSSSDWSDDGWHWVSGWGSSYDGKSGKGSYGGDSGKSGKGSYDGGWGSYGSGKSGKGSYGGWSDGSGKSGKGSYGGWSDGSGKSGKGSYDGGWGSYGSGKSGKGSYGGWSDGSGKSGKGSYGGWSDGSDGGWGSSSEYEGWYSGHGGWGSDDDDSWGSSSSSKSSKGSSKSGKGSSKSSKGSSKSSKGSSKSSKGSSKSSKGSSKSSKGSSDGNWVWVSGWGSDDDHWGGGSGKSGKGSSGGGWSDDGWGAGSSSKSSKSGSGDDGWGGSDGHIVESNNNWVGSGDGGDSWGTDGWTNDGHDDKWSGDSWADDGHVSGSGKSGKSGSGGSGDGWGGSDGSSKSSKSGSGGSGDAWGGSDGSSKSGKSGSGGSGDAWGGSDGSSSKSGKSGSGGSGDSWGDDAWGGSDGSSSKSGKSGSGGSGDSWGDDAWGGSGGSSSKSGKSGSGGSSDSWGSSSKSGKSGSGGSSKSGKSGAGADGWEADGYEQDSAISKASTEMSFSTEASSSNRRRIVVALGAAAGGAVLLL

(i) B8CGN3 CingulinY1 from T. pseudonana (tpCinY1)MKSIIALSTIALASAGTNKTLAPTPFPGRPTPIPTPVNTYIVTEQTPAPTPGDVITPAPTICEEKIFFFDGGMCTNMFEVADGSSYNTLIQCCNANFGSFAMCVYEDMCVDVKPTRRPTTRPPTDMSYNYGIVDCFGKSGKSGSGCGKSGKGSKSSKSGGGYGYGDNYVDDYTPSTNDYSHSTNDYTPSTNDYEYGYGHGSSGKSGKGSKSSKGGKSSKSSGKSSKSSGKSGKGSSSSGKSGKGSDGHYTGDGYRYDDDAYYRKLSEGQAGGLRRTRKMP

(j) B8CGQ5, CingulinY2 from T. pseudonana (tpCinY2)MKLIIALSAITLASAGTNKTLPPTPFPGRPTPNPTMVNTIGTPGPSFIVTEQTPAPTPGDVLTPQPTPLPTLGGVPTTKMPTEMSYGYGYGDYGIVDCFGKSGKSGSGCGKSGKGSKSSGKSGKSGGGGGGGYGYGDNYADDYTPSTDDYEYGYGHGGSSGKSGKGSSGKSGKSSSKSSKGSGKSSKSSGKSSKSSGKSGKGGSRDDGHGYGGYGGYEGYGGYEGYQYGGDEYVRRNRRLGASHNNRI

(k) B8CGS1, CingulinY3 from T. pseudonana (tpCinY3)MKFSASILLLTVATASAGTNKTLAPTPFPGRPTPPGAGTPFPTENTPAPSPAFGTKPPTPVSESVASLLIVSWFVLGSMWPLNGRMNVSLTVHTVDRWRADTTLKDGTAEQRDRCSSYEPPQYSYEPPTTGCSKAGKGGKSGSMDYLIDCIDLSSKSGKSGSGYGPSSSKGGKSGSSSAGYGDDYTATTDDYSAGADAGKSENYDEEASRDDGHYGASSKGGKSGSAGYGDEGYGSSAGSSKGGKSEADGYGDESYGDSGDSKAGKAEAGYGDDYGASAKSGKGSDGYGSSSKSGKAGSAKSGKGEGYHMFHDKSGKGGKGSSSGGEGYGYGYDEAHDYGYGRRTRGLRASQ

Figure A.4 Sequences of biosilica-associated proteins. Colour codes: KXXK , tetrapeptide mo-tif; RXL , N-terminal processing site; N , asparagine residues (putative N-linked glycosylationsite); S/T , hydroxy-amino acids residues (putative O-linked glycosylation and phosphoryla-tion site).

A.2 xics of qac-derivatives 123

a.2 extracted-ion-chromatograms of qac-derivatives

0

20

40

60

80

100

0

20

40

60

80

100

0

20

40

60

80

100

0

20

40

60

80

100

0 5 10 15 20 25 30 35

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Rela

tive A

bu

nd

an

ce

+ +

+ +

+ +

+ +

+ +

+ +

PTM 319 (3×QAC)

PTM 333 (2×QAC)

PTM 347 (2×QAC)

PTM 399 (3×QAC)

PTM 413 (2×QAC)

PTM 427 (2×QAC)

Figure A.5 extracted-ion-chromatograms (XICs) (3 ppm accuracy) of phosphopolyamine mod-ifications detected in biosilica extracts from T. pseudonana and C. cryptica. For presentation clar-ity the structures are annotated with nominal m/z values of their underivatized molecular ions.Phosphate is highlighted in red; QAC, derivatization group is faded out.

124 supplemental material

-+-

+-

+-

+-

+-

+-

+-

+-

+-

+-

+-

+-

+-

+P

TM

16

1

(2×

QA

C)

PT

M 1

75

(1×

QA

C)

PT

M 1

89

(1×

QA

C)

PT

M 2

04

(3×

QA

C)

PT

M 2

61

(4×

QA

C)

PT

M 2

75

(4×

QA

C)

PT

M 2

89

(3×

QA

C)

PT

M 3

03

(2×

QA

C)

PT

M 3

03

(3×

QA

C)

PT

M 1

63

(2×

QA

C)

PT

M 2

05

(1×

QA

C)

PT

M 3

19

(3×

QA

C)

PT

M 3

33

(2×

QA

C)

PT

M 3

47

(1×

QA

C)

+ +

+ +

+ +

+ +

+ +

+ +

HF

-treatm

en

t0

%

5%

10

%

15

%

20

%

25

%

30

%nominal m/z of PTM

(N×QAC-groups)

no

n-p

ho

sph

ory

late

d

PT

M 3

99

(3×

QA

C)

PT

M 4

13

(2×

QA

C)

PT

M 4

27

(1×

QA

C)

ph

osp

ho

ryla

ted

(e) p

ho

sph

ory

late

d

(c) ε-p

oly

am

ines

(b) ε-m

eth

yla

ted

-+n.d

.

T. p

seu

do

na

na

mol. %

(d) δ

-hyd

rox

y-p

oly

am

ines

PTM 319 (3×QAC)

PTM 333 (2×QAC)

PTM 347 (2×QAC)

PTM 399 (3×QAC)

PTM 413 (2×QAC)

PTM 427 (2×QAC)

Figure A.6Structure

andcontentoflysine

post-translationalmodifications

(PTMs)in

hydrolysatesofA

FSMextracts

fromT.pseudonana

before(–)and

after(+)H

F-treatment.Error

barsfor

2replicates.C

hemicalstructures

ofdetectedphosphorylated

modifications

andtheir

non-phosphorylatedcounterparts

aredepicted

ontop.

Phosphorylatedstructures

were

completely

convertedto

non-phosphorylatedones

byH

F-treatment.PTM

sare

annotatedw

ithnom

inalm/z

valuesofsingly

protonatedm

olecularions

(with

therespective

number

ofQ

AC

-groupsin

brackets).Seealso

Fig.3.10and

Table3.2.

A.2 xics of qac-derivatives 125

0

50

10023.82

0

50

10024.50

0

50

10016.97

0

50

10022.66

0

50

10029.65

0

50

10026.50

0

50

10021.11

0

50

10027.18

0

50

1006.01

0

50

10020.90

0

50

10026.06

0

50

1005.52

0

50

10018.19

0

50

10019.75

0

50

10018.12

0

50

10023.84

0

50

10019.18

0

50

10011.23

0

50

10020.61

0

50

10016.79

0

50

10026.11

m/z=147

2xQAC

m/z=161

2xQAC

0 5 10 15 20 25 30 35 40 45 50 55 60

Retention time (min)

Rela

tive a

bu

nd

an

ce (

%)

m/z=175

1xQAC

m/z=189

1xQAC

m/z=205

1xQAC

m/z=232

2xQAC

m/z=275

4xQAC

m/z=275-orn (internal standard)

3xQAC

m/z=289

3xQAC

m/z=303a

2xQAC

m/z=303b

3xQAC

m/z=317a

1xQAC

m/z=317b

2xQAC

m/z=319

3xQAC

m/z=331a

1xQAC

m/z=331b

2xQAC

+

+ +

+ +

m/z=333

2xQAC

m/z=347

2xQAC

m/z=399

3xQAC

m/z=413

2xQAC

m/z=427

2xQAC

+ +

+

+

+

+

+

+

+

+

+

+

+ + +

+ +

+ +

+ + +

+

+ +

+ +

+ +

+ +

+ +

+ +

Figure A.7 Extracted-ion-chromatograms (XICs) (3 ppm accuracy) of QAC-derivatized ε-polyaminemodifications detected in biosilica extracts from the three diatom species. QAC, derivatization group(faded out).

126 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

257.2320

C 11 H27 N7

-0.2844 mmu

86.0966

C 5 H12 N

0.1326 mmu

155.1170

C 6 H13 N5

0.4072 mmu

212.1745

C 9 H20 N6

0.0554 mmu

239.2215

C 13 H27 N4

-1.5306 mmu275.2423

C 11 H29 O N7

-0.5083 mmu

143.1541

C 8 H19 N2

-0.2080 mmu

173.1282

z=1

C 8 H17 O 2 N2

-0.2652 mmu

-18.0106

z=1

H2 O

0.5733 mmu

+

143.1543

160.1808

116.0706

230.1863

86.0964

173.1285

103.1230

(a) Fragment spectrum of underivatized lysine modification 275-orn (m/z 275; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0553

C 10 H7 O N2

0.0237 mmu

275.2441

C 13 H31 O 2 N4

-0.0121 mmu

445.2921

C 23 H37 O 3 N6

-0.0513 mmu

149.0234

C 8 H5 O 3

0.1170 mmu

257.2335

C 13 H29 O N4

-0.0981 mmu

187.1441

C 9 H19 O 2 N2

0.0302 mmu

98.0969

C 6 H12 N

0.5208 mmu

QAC

-18.0107

z=1

H2 O

0.6344 mmu

M-2xQAC

M-3xQAC

+ +

445.2922

(b) Fragment spectrum of 3×QAC-derivatized lysine modification 275 (m/z 393; 2+)

Figure A.8 HCD tandem mass spectrometry (MS/MS) spectra of the synthetic ornithinederivative PTM 275-orn (internal standard). (a) spectrum of underivatized molecule (m/z275.2442; 1+), nCE to 30 %; (b) spectrum of 3×QAC-derivatized molecule (m/z 393.1977; 2+),nCE to 30 %. Fragment peaks are annotated with an accurate mass, corresponding calculatedchemical composition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 127

80 90 100 110 120 130 140 150 160 170 180 190 200

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

84.0813

C 5 H10 N

0.5450 mmu

130.0862

C 6 H12 O 2 N

-0.0535 mmu

98.0968

C 6 H12 N

0.3778 mmu

161.1282

C 7 H17 O 2 N2

-0.2415 mmu

immK -NH3

115.0754

C 6 H11 O 2

0.0085 mmu

imm-meKimm-meK -NH3

+

(a) Fragment spectrum of underivatized lysine modification 161 (m/z 161; 1+)

80 100 120 140 160 180 200 220 240 260

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0553

C 10 H7 O N2

-0.0221 mmu

161.1285

C 7 H17 O 2 N2

0.0094 mmu

130.0864

C 6 H12 O 2 N

0.1401 mmu

84.0814

C 5 H10 N

0.6603 mmu

98.9847

H4 O 4 P

0.5009 mmu

QAC

+

+

(b) Fragment spectrum of 2×QAC-derivatized lysine modification 161 (m/z 251; 2+)

Figure A.9 HCD MS/MS spectra of lysine derivative 161. (a) spectrum of underivatizedmolecule (m/z 161; 1+), nCE to 30 %; (b) spectrum of 2×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

128 supplemental material

80 90 100 110 120 130 140 150 160 170 180 190 200

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

84.0813

C 5 H10 N

0.4942 mmu

130.0862

C 6 H12 O 2 N

-0.1020 mmu

175.1438

C 8 H19 O 2 N2

-0.3527 mmu116.0707

C 5 H10 O 2 N

0.0740 mmu158.0921

C 6 H12 O 2 N3

-0.2934 mmu

106.9921

C H5 N2 P 2

-0.1026 mmu

+

(a) Fragment spectrum of underivatized lysine modification 175 (m/z 175; 1+)

172.0586

C 5 H8 O 3 N4

-0.4979 mmu

175.1442

C 8 H19 O 2 N2

0.0607 mmu

80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0553

C 10 H7 O N2

0.0542 mmu

130.0864

C 6 H12 O 2 N

0.1859 mmu

84.0815

C 5 H10 N

0.6984 mmu

QAC

+

+

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 175 (m/z 251; 2+)

Figure A.10 HCD MS/MS spectra of lysine derivative 175. (a) spectrum of underivatizedmolecule (m/z 175; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 129

80 90 100 110 120 130 140 150 160 170 180 190 200

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

84.0815

C 5 H10 N

0.7599 mmu

130.0866

C 6 H12 O 2 N

0.2993 mmu

189.1600

C 9 H21 O 2 N2

0.2356 mmu

144.1385

C 8 H18 O N

0.2040 mmu

+

(a) Fragment spectrum of underivatized lysine modification 189 (m/z 189; 1+)

80 90 100 110 120 130 140 150 160 170 180 190 200 210

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.0847 mmu

84.0815

C 5 H10 N

0.7213 mmu

130.0865

C 6 H12 O 2 N

0.2317 mmu

189.1599

C 9 H21 O 2 N2

0.0966 mmu

QAC

+

+

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 189 (m/z 251; 2+)

Figure A.11 HCD MS/MS spectra of lysine derivative 189. (a) spectrum of underivatizedmolecule (m/z 189; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

130 supplemental material

80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

232.2019

C 11 H26 O 2 N3

-0.0606 mmu

201.1597

C 10 H21 O 2 N2

-0.0250 mmu

130.0864

C 6 H12 O 2 N

0.1137 mmu

98.0969

C 6 H12 N

0.5024 mmu

143.1543

C 8 H19 N2

0.0283 mmu

161.1284

C 7 H17 O 2 N2

-0.0176 mmu187.1805

C 10 H23 O N2

-0.0259 mmu

+

201.1598130.0863

103.1230

161.1285

(a) Fragment spectrum of underivatized lysine modification 232 (m/z 232.2020; 1+)

QAC

100 150 200 250 300 350 400 450 500 550 600

m/z

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

171.0549

z=1

C 10 H7 O N2

-0.3848 mmu

232.2014

z=1

C 11 H26 O 2 N3

-0.5296 mmu

201.1593

z=2

C 10 H21 O 2 N2

-0.4358 mmu

402.2488

z=1

C 20 H36 O 7 N

0.1944 mmu

130.0861

z=1

C 6 H12 O 2 N

-0.1639 mmu

242.1281

z=1

C 14 H16 O N3

-0.7285 mmu

-2×QAC

+

+

402.2499 (+1)371.2078

201.6286 (+2)

242.1288

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 232 (m/z 286.65; 2+)

Figure A.12 HCD MS/MS spectra of lysine derivative 232. (a) spectrum of underivatizedmolecule (m/z 232.2020; 1+), nCE to 35 %; (b) spectrum of 2×QAC-derivatized molecule (m/z286.65; 2+), nCE to 30 %. Fragment peaks are annotated with an accurate mass, correspondingcalculated chemical composition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 131

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

98.0969

C 6 H12 N

0.4369 mmu

84.0814

C 5 H10 N

0.5993 mmu

187.1440

C 9 H19 O 2 N2

-0.1455 mmu

275.2770

C 14 H35 O N4

-3.5022 mmu

129.1387

C 7 H17 N2

0.0508 mmu

170.1174

C 9 H16 O 2 N

-0.1487 mmu241.5890

106.0866

C 4 H12 O 2 N

0.3370 mmu

72.0815

C 4 H10 N

0.7420 mmu

+

129.1386

(a) Fragment spectrum of underivatized lysine modification 275 (m/z 275; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.1305 mmu

275.2443

C 13 H31 O 2 N4

0.1405 mmu

445.2924

C 23 H37 O 3 N6

0.2234 mmu

257.2337

C 13 H29 O N4

0.0850 mmu

145.0762

C 9 H9 N2

0.1623 mmu

187.1442

C 9 H19 O 2 N2

0.1370 mmu

98.0970

C 6 H12 N

0.5818 mmu

301.2233

C 14 H29 O 3 N4

-0.1507 mmu

+ + +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 275 (m/z 251; 2+)

Figure A.13 HCD MS/MS spectra of lysine derivative 275. (a) spectrum of underivatizedmolecule (m/z 275; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

132 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300 320

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

86.0970

C 5 H12 N

0.5290 mmu

98.0968

C 6 H12 N

0.3794 mmu

289.2592

C 14 H33 O 2 N4

-0.5727 mmu

187.1439

C 9 H19 O 2 N2

-0.2426 mmu143.1541

C 8 H19 N2

-0.1594 mmu

244.2016

C 12 H26 O 2 N3

-0.3818 mmu

201.1595

C 10 H21 O 2 N2

-0.3019 mmu

170.1173

C 9 H16 O 2 N

-0.2582 mmu

+

143.1543

244.2020

86.0964

187.1441

103.1230160.1808

130.0863

(a) Fragment spectrum of underivatized lysine modification 289 (m/z 289; 1+)

100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

m/z

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

171.0553

z=1

C 10 H7 O N2

0.0436 mmu

289.2597

z=1

C 14 H33 O 2 N4

-0.1453 mmu

459.3079

z=1

C 24 H39 O 3 N6

0.1066 mmu

230.1576

z=2

C 5 H16 N11

-0.8901 mmu

143.1544

z=1

C 8 H19 N2

0.0967 mmu

187.1442

z=1

C 9 H19 O 2 N2

0.0814 mmu

86.0971

z=1

C 5 H12 N

0.6638 mmu

271.2491

z=1

C 14 H31 O N4

-0.1600 mmu

-18.0106

z=1

-H 2 O

0.5632 mmu

QAC

+ +

143.1543 86.0964

187.1441

459.3078

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 289 (m/z 251; 2+)

Figure A.14 HCD MS/MS spectra of lysine derivative 289. (a) spectrum of underivatizedmolecule (m/z 289; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 133

80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

201.1614

C 10 H21 O 2 N2

1.6726 mmu

317.2936

C 18 H39 O 3 N

1.1269 mmu

157.1713

C 9 H21 N2

1.3478 mmu130.0875

C 6 H12 O 2 N

1.1988 mmu 272.2353

C 16 H32 O 3

0.7339 mmu

98.0977

C 6 H12 N

1.3108 mmu

86.0972

C 5 H12 N

0.7741 mmu

157.1699

+

130.0863 272.2333201.1598

(a) Fragment spectrum of underivatized lysine modification 317a (m/z 317; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

10

20

30

40

50

60

70

80

90

100

Rela

tive A

bundance

171.0555

z=1

C 10 H7 O N2

0.1894 mmu

232.2022

z=1

C 11 H26 O 2 N3

0.2216 mmu

317.2913

z=1

C 16 H37 O 2 N4

0.1719 mmu

402.2502

z=1

C 21 H32 O 3 N5

0.1949 mmu

201.1600

z=1

C 10 H21 O 2 N2

0.2014 mmu

QAC

86.0972

z=1

C 5 H12 N

0.7322 mmu

371.2077

z=1

C 20 H27 O 3 N4

-0.0646 mmu

+

317.2911

157.1699

371.2078

402.2499

232.2019

201.1598

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 317a (m/z 251; 2+)

Figure A.15 HCD MS/MS spectra of lysine derivative 317a. (a) spectrum of underivatizedmolecule (m/z 317; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

134 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

201.1600

C 10 H21 O 2 N2

0.2305 mmu317.2913

C 16 H37 O 2 N4

0.1768 mmu98.0971

C 6 H12 N

0.6496 mmu 143.1545

C 8 H19 N2

0.2238 mmu

187.1443

C 9 H19 O 2 N2

0.2361 mmu

86.0972

C 5 H12 N

0.7645 mmu

+

143.1543

130.0863 215.1754 272.2333

(a) Fragment spectrum of underivatized lysine modification 317b (m/z 317; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

143.1544

C 8 H19 N2

0.1133 mmu

98.0970

C 6 H12 N

0.5436 mmu

171.0554

C 10 H7 O N2

0.1000 mmu

317.2546

C 15 H33 O 3 N4

-0.1092 mmu487.3389

C 26 H43 O 3 N6

-0.1875 mmu

QAC

386.2393

C 21 H32 O 3 N4

-0.4933 mmu

+ +

487.3391

143.1543

386.2318

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 317b (m/z 251; 2+)

Figure A.16 HCD MS/MS spectra of lysine derivative 317b. (a) spectrum of underivatizedmolecule (m/z 317; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 135

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

143.1545

C 8 H19 N2

0.2375 mmu

331.2093

C 13 H27 O 4 N6

0.4634 mmu

157.1702

C 9 H21 N2

0.2474 mmu

313.2739

C 19 H37 O 3

0.1868 mmu

109.1017

C 8 H13

0.5265 mmu

215.1393

C 10 H19 O 3 N2

0.2611 mmu

129.1389

C 7 H17 N2

0.3067 mmu

270.2793

C 17 H36 O N

0.1538 mmu

239.2372

C 16 H31 O

0.2563 mmu

98.0971

C 6 H12 N

0.6491 mmu

86.0972

C 5 H12 N

0.7625 mmu

+

157.1699

(a) Fragment spectrum of underivatized lysine modification 331a (m/z 331; 1+)

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0555

C 10 H7 O N2

0.1610 mmu

0.6154 mmu

157.1702

C 9 H21 N2

0.2255 mmu

331.3069

C 17 H39 O 2 N4

0.1318 mmu

86.0972

C 5 H12 N

0.7344 mmu

98.0970

C 6 H12 N

+

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 331a (m/z 251; 2+)

Figure A.17 HCD MS/MS spectra of lysine derivative 331a. (a) spectrum of underivatizedmolecule (m/z 331; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

136 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

95.0862

C 7 H11

0.6936 mmu

109.1017

C 8 H13

0.5519 mmu

157.1702

C 9 H21 N2

0.2722 mmu 313.2739

C 19 H37 O 3

0.2036 mmu130.0866

C 6 H12 O 2 N

0.3451 mmu

215.1757

C 11 H23 O 2 N2

0.3348 mmu 331.2094

C 13 H27 O 4 N6

0.5939 mmu

257.2477

C 16 H33 O 2

0.2319 mmu

+ +

157.1699

(a) Fragment spectrum of underivatized lysine modification 331b (m/z 331; 1+)

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.1305 mmu

161.1286

C 7 H17 O 2 N2

0.1620 mmu98.0970

C 6 H12 N

0.6047 mmu130.0865

C 6 H12 O 2 N

0.2469 mmu

187.1079

C 8 H15 O 3 N2

0.1608 mmu

331.3071

C 17 H39 O 2 N4

0.3149 mmu

172.0587

C 5 H8 O 3 N4

-0.3758 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 331b (m/z 251; 2+)

Figure A.18 HCD MS/MS spectra of lysine derivative 331b. (a) spectrum of underivatizedmolecule (m/z 331; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 137

70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

82.0658

C 5 H8 N

0.6497 mmu

100.0761

C 5 H10 O N

0.4478 mmu

128.0707

C 6 H10 O 2 N

0.0990 mmu 146.0811

C 6 H12 O 3 N

-0.0601 mmu

205.1546

C 9 H21 O 3 N2

-0.0764 mmu160.1331

C 8 H18 O 2 N

-0.0798 mmu

74.0244

C 2 H4 O 2 N

0.7406 mmu

72.0815

C 4 H10 N

0.7574 mmu

+

(a) Fragment spectrum of underivatized lysine modification 205 (m/z 205; 1+)

80 100 120 140 160 180 200 220 240 260 280 300

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.1153 mmu

188.0820

C 10 H10 O N3

0.1470 mmu

146.0813

C 6 H12 O 3 N

0.1596 mmu

100.0763

C 5 H10 O N

0.5729 mmu

128.0709

C 6 H10 O 2 N

0.2720 mmu

82.0659

C 5 H8 N

0.7464 mmu

205.1548

C 9 H21 O 3 N2

0.1619 mmu

+

+

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 205 (m/z 251; 2+)

Figure A.19 HCD MS/MS spectra of lysine derivative 205. (a) spectrum of underivatizedmolecule (m/z 205; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

138 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

129.1387

C 7 H17 N2

0.1020 mmu

98.0969

C 6 H12 N

0.4771 mmu

319.2700

C 15 H35 O 3 N4

-0.3383 mmu

86.0970

C 5 H12 N

0.6054 mmu

+

129.1386

(a) Fragment spectrum of underivatized lysine modification 319 (m/z 319; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.0695 mmu

299.1866

C 17 H23 O N4

0.0076 mmu

268.1444

C 16 H18 O N3

-0.0296 mmu

191.1391

C 8 H19 O 3 N2

0.0954 mmu

129.1388

C 7 H17 N2

0.2146 mmu

489.3183

C 25 H41 O 4 N6

-0.0514 mmu

361.1870

C 18 H25 O 4 N4

-0.0200 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 319 (m/z 251; 2+)

Figure A.20 HCD MS/MS spectra of lysine derivative 319. (a) spectrum of underivatizedmolecule (m/z 319; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 139

80 100 120 140 160 180 200 220 240 260 280 300 320 340

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

143.1542

C 8 H19 N2

-0.0280 mmu

98.0969

C 6 H12 N

0.4637 mmu

86.0970

C 5 H12 N

0.6080 mmu

333.2855

C 16 H37 O 3 N4

-0.5031 mmu

188.2120

C 10 H26 N3

-0.1125 mmu

115.1232

C 6 H15 N2

0.2532 mmu

+

143.1543

188.2121

(a) Fragment spectrum of underivatized lysine modification 333 (m/z 333; 1+)

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.0847 mmu

143.1544

C 8 H19 N2

0.1438 mmu

98.0970

C 6 H12 N

0.5742 mmu268.1444

C 16 H18 O N3

0.0010 mmu

503.3343

C 26 H43 O 4 N6

0.2287 mmu

191.1391

C 8 H19 O 3 N2

0.0802 mmu

333.2860

C 16 H37 O 3 N4

-0.0373 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 333 (m/z 251; 2+)

Figure A.21 HCD MS/MS spectra of lysine derivative 333. (a) spectrum of underivatizedmolecule (m/z 333; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

140 supplemental material

80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

157.1702

C 9 H21 N2

0.2383 mmu

347.3019

C 17 H39 O 3 N4

0.2623 mmu

129.1389

C 7 H17 N2

0.3088 mmu

98.0971

C 6 H12 N

0.6326 mmu 303.0934

C 10 H15 O 7 N4

-0.0772 mmu

251.2702

C 18 H35

-3.1556 mmu

219.0572

C 8 H15 O 2 N P 2

-0.0562 mmu

+ +

157.1699

(a) Fragment spectrum of underivatized lysine modification 347 (m/z 347; 1+)

80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.1305 mmu

268.1445

C 16 H18 O N3

0.0925 mmu

254.0924

C 14 H12 O 2 N3

0.0193 mmu

191.1391

C 8 H19 O 3 N2

0.1259 mmu

332.1241

C 16 H18 O 5 N3

0.0179 mmu

130.0865

C 6 H12 O 2 N

0.2622 mmu

386.4532

C 23 H54 N4

18.9479 mmu229.6420

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 347 (m/z 251; 2+)

Figure A.22 HCD MS/MS spectra of lysine derivative 347. (a) spectrum of underivatizedmolecule (m/z 347; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 141

100 150 200 250 300 350 400

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

129.1384

C 7 H17 N2

-0.2371 mmu

399.2354

C 17 H31 O 5 N6

0.3553 mmu86.0968

C 5 H12 N

0.3916 mmu

98.0967

C 6 H12 N

0.2241 mmu 303.2744

C 15 H35 O 2 N4

-1.0685 mmu

174.1960

C 9 H24 N3

-0.4372 mmu

319.2693

C 15 H35 O 3 N4

-1.1088 mmu

143.1539

C 8 H19 N2

-0.3740 mmu

+

129.1386

(a) Fragment spectrum of underivatized lysine modification 399 (m/z 399; 1+)

100 150 200 250 300 350 400 450 500 550 600

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.0847 mmu

129.1389

C 7 H17 N2

0.2451 mmu

436.1505

C 23 H22 O 6 N3

0.1398 mmu

299.1866

C 17 H23 O N4

0.0076 mmu98.0970

C 6 H12 N

0.5818 mmu

569.2851

C 15 H41 O 13 N10

0.1876 mmu

220.0969

C 12 H14 O 3 N

0.0887 mmu

399.2368

C 15 H36 O 6 N4 P

0.0881 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 399 (m/z 251; 2+)

Figure A.23 HCD MS/MS spectra of lysine derivative 399. (a) spectrum of underivatizedmolecule (m/z 399; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

142 supplemental material

100 150 200 250 300 350 400 450 500

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

143.1543

C 8 H19 N2

-0.0074 mmu

188.2120

C 10 H26 N3

-0.0887 mmu86.0970

C 5 H12 N

0.6105 mmu

413.2519

C 19 H29 O N10

-0.1028 mmu

333.2856

C 16 H37 O 3 N4

-0.4587 mmu

223.1204

C 12 H17 O 3 N

0.1374 mmu

+

143.1543

(a) Fragment spectrum of underivatized lysine modification 413 (m/z 413; 1+)

100 150 200 250 300 350 400 450

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 10 H7 O N2

0.0695 mmu

143.1544

C 8 H19 N2

0.1285 mmu

188.2122

C 10 H26 N3

0.1101 mmu

413.2524

C 16 H38 O 6 N4 P

0.0630 mmu

98.0970

C 6 H12 N

0.5665 mmu268.1444

C 16 H18 O N3

-0.0296 mmu

439.2317

C 17 H36 O 7 N4 P

0.1076 mmu

86.0971

C 5 H12 N

0.6962 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 413 (m/z 251; 2+)

Figure A.24 HCD MS/MS spectra of lysine derivative 413. (a) spectrum of underivatizedmolecule (m/z 413; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

A.2 xics of qac-derivatives 143

100 150 200 250 300 350 400 450

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

86.0968

C 6 H14

-12.1871 mmu

157.1695

C 8 H19 N3

12.1536 mmu

143.1539

C 7 H17 N3

12.1900 mmu

116.0706

C 3 H8 O N4

1.3023 mmu 202.2271

C 13 H30 O

-1.9861 mmu347.3004

C 15 H37 O 2 N7

0.0821 mmu

427.2665

C 15 H38 O 5 N7 P

-0.1963 mmu279.7980

C 2 H O 14 P

-111.8351 mmu

+ +

157.1699

(a) Fragment spectrum of underivatized lysine modification 427 (m/z 427; 1+)

100 150 200 250 300 350 400 450

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rela

tive A

bundance

171.0554

C 6 H10 O N3 P

-0.1801 mmu

242.1289

C 10 H19 O N4 P

-0.2235 mmu

311.1367

C 10 H18 O N9 P

0.0135 mmu

143.1544

C 9 H21 N

-12.4170 mmu

427.2684

C 22 H33 O 2 N7

-0.6227 mmu

86.0972

C 5 H12 N

0.7267 mmu

+ +

(b) Fragment spectrum of 1×QAC-derivatized lysine modification 427 (m/z 251; 2+)

Figure A.25 HCD MS/MS spectra of lysine derivative 427. (a) spectrum of underivatizedmolecule (m/z 427; 1+), nCE to 30 %; (b) spectrum of 1×QAC-derivatized molecule, nCE to30 %. Fragment peaks are annotated with an accurate mass, corresponding calculated chemicalcomposition (CHNOP) and delta mass (in mmu).

Table A.2 Sequences of identified proteins bearing lysine ε-polyamine PTMs. The legend for the color-coding: COVERAGE , peptide coverage; KXXK and RXL ,sequence motifs (where X stands for any amino acid); K , modified lysine residue; M , oxidized methionine residue; TP , T. pseudonana; CC , C. cryptica; TO ,T. oceanica; AFSM, ammonium fluoride soluble material; AFIM, ammonium fluoride insoluble material.

# protein sequence coverage

1B5YLH4

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

2B5YLI3

TP

AFSM

AFIM

3B5YLL4

TP

AFSM

4B5YLX5

TP

AFSM

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

5B5YNQ3

TP

AFSM

6B8BRK6

TP

AFSM

7B8BSN6

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

8B8BV44

TP

AFSM

9B8BYI7

TP

AFSM

10B8C0W5

TP

AFSM

11B8C2P6

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

12B8C2P7

TP

AFSM

13B8C406

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

14B8C8R2

TP

AFSM

15B8C9R4

TP

AFSM

16B8CBB3

TP

AFSM

17B8CC24

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

18B8CE63

TP

AFSM

19B8CG95

TP

AFSM

20B8CGN3

TP

AFIM

21B8CGQ5

TP

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

22B8CGS1

TP

AFSM

AFIM

23B8LBG8

TP

AFSM

24B8LBU6

TP

AFSM

25B8LDT2

TP

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

26g11469

CC

AFSM

AFIM

27g11606

CC

AFIM

28g13975

CC

AFSM

AFIM

29g1484

CC

AFSM

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

30g15479

CC

AFSM

AFIM

31g15720

CC

AFSM

32g22685

CC

AFSM

33g25187

CC

AFSM

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

34g2543

CC

AFIM

35g3798

CC

AFIM

36g3964

CC

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

37g749

CC

AFIM

38g7979

CC

AFIM

39g8502

CC

AFSM

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

40g9515

CC

AFSM

AFIM

41K0R7E4

TO

AFSM

42K0R8C7

TO

AFSM

43K0RCW9

TO

AFIM

44K0RHV4

TO

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

45K0RIC9

TO

AFIM

46K0RN71

TO

AFSM

47K0RU48

TO

AFSM

AFIM

48K0RWT0

TO

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

49K0S1R3

TO

AFSM

50K0S7V0

TO

AFSM

51K0S9A6

TO

AFSM

52K0SAX6

TO

AFSM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

53K0SQ58

TO

AFSM

54K0SSD7

TO

AFSM

55K0T322

TO

AFIM

56K0T463

TO

AFIM

Table A.2 Sequences of identified proteins bearing lysine PTMs (continued from previous page)

# protein sequence coverage

57K0T7A1

TO

AFSM

58K0TBU5

TO

AFSM

59K0TCJ2

TO

AFSM

60K0RNL4

TO

AFSM

AFIM

A.2 xics of qac-derivatives 161

TP ..IK↑XXK.. ..KXX

IIK↑.. Total

PAK↑

11 6 17

MeK↑

1 8 9

Total 12 14 26

(a) P = 0.0145

TP ..KXXUnK↑.. ..KXX

MeK↑.. Total

..UnK↑XXK.. 3 2 5

..PAK↑XXK.. 3 5 8

Total 6 7 13

(b) P = 0.5921

CC ..IK↑XXK.. ..KXX

IIK↑.. Total

PAK↑

11 0 11

MeK↑

0 24 24

Total 11 24 35

(c) P < 0.0001

CC ..KXXUnK↑.. ..KXX

MeK↑.. Total

..UnK↑XXK.. 2 2 4

..PAK↑XXK.. 0 12 12

Total 2 14 16

(d) P = 0.0499

TO ..IK↑XXK.. ..KXX

IIK↑.. Total

PAK↑

7 4 11

MeK↑

4 10 14

Total 11 14 25

(e) P = 0.1160

TO ..KXXUnK↑.. ..KXX

MeK↑.. Total

..UnK↑XXK.. 0 2 2

..PAK↑XXK.. 0 3 3

Total 0 5 5

(f ) P = 1.0000

All ..IK↑XXK.. ..KXX

IIK↑.. Total

PAK↑

29 10 39

MeK↑

5 42 47

Total 34 52 86

(g) P < 0.0001

All ..KXXUnK↑.. ..KXX

MeK↑.. Total

..UnK↑XXK.. 5 6 11

..PAK↑XXK.. 3 20 23

Total 8 26 34

(h) P = 0.0789

Table A.3 Contingency tables for Fisher’s exact test (data taken from Fig. 3.23). Tables A.3a, A.3c, A.3eand A.3g, non-random modification patterns in KXXK; Tables A.3b, A.3d and A.3h, analysis of interactionbetween ε-polyamination and methylation (crosstalk). The association is considered to be statisticallysignificant when two-tailed P-value is less than 0.05. I, N-terminal lysine in KXXK; II, C-terminal lysinein KXXK; PA, ε-polyaminated lysine (PTMs 232, 289 or 333); Me, di- or trimethylated lysine (PTMs 175 or189); Un, unmodified lysine.

B B I B L I O G R A P H Y

[1] Frank E. Round, Richard M. Crawford, and David

G. Mann. Diatoms: Biology and Morphology of the

Genera. Cambridge University Press, 1990.

[2] C. B. Field. Primary Production of the Biosphere: In-

tegrating Terrestrial and Oceanic Components. Sci-

ence 281.5374 (1998), pp. 237–240.

[3] Virginia E. Armbrust. The life of diatoms in the

world's oceans. Nature 459.7244 (2009), pp. 185–192.

[4] Shruti Malviya et al. Insights into global diatom dis-

tribution and diversity in the world’s ocean. Proceed-

ings of the National Academy of Sciences 113.11 (2016),

E1516–E1525.

[5] Jaap S. Sinninghe Damsté et al. The Rise of the Rhi-

zosolenid Diatoms. Science 304.5670 (2004), pp. 584–

587.

[6] Maxime C. Bridoux, Vadim V. Annenkov, Richard

G. Keil, and Anitra E. Ingalls. Widespread dis-

tribution and molecular diversity of diatom frus-

tule bound aliphatic long chain polyamines (LCPAs)

in marine sediments. Organic Geochemistry 48 (2012),

pp. 9 –20.

[7] Sunil Kumar Shukla and Rahul Mohan. The Con-

tribution of Diatoms to Worldwide Crude Oil De-

posits. Cellular Origin, Life in Extreme Habitats and As-

trobiology. Springer Netherlands, 2012, pp. 355–382.

[8] Anne-Sophie Benoiston, Federico M. Ibarbalz, Lu-

cie Bittner, Lionel Guidi, Oliver Jahn, Stephanie

Dutkiewicz, and Chris Bowler. The evolution of di-

atoms and their biogeochemical functions. Philosoph-

ical Transactions of the Royal Society B: Biological Sciences

372.1728 (2017).

[9] Wiebe H.C.F. Kooistra and Linda K. Medlin. Evolu-

tion of the Diatoms (Bacillariophyta). Molecular Phy-

logenetics and Evolution 6.3 (1996), pp. 391–407.

[10] Wiebe H.C.F. Kooistra, Mario De Stefano, David

G. Mann, and K. Medlin. The Phylogeny of the Di-

atoms. Silicon Biomineralization. Springer Berlin Heidel-

berg, 2003, pp. 59–97.

[11] D. G. Mann and S. J. M. Droop. Biodiversity, bio-

geography and conservation of diatoms. Hydrobiologia

336.1 (1996), pp. 19–32.

[12] A. Montsant, K. Jabbari, U. Maheswari, and C.

Bowler. Comparative genomics of the pennate di-

atom Phaeodactylum tricornutum. Plant Physiology

137.2 (2005), pp. 500–513.

[13] E. Virginia Armbrust et al. The Genome of the Di-

atom Thalassiosira Pseudonana: Ecology, Evolution,

and Metabolism. Science 306.5693 (2004), pp. 79–86.

[14] Markus Lommer et al. Genome and low-iron re-

sponse of an oceanic diatom adapted to chronic iron

limitation. Genome Biology 13.7 (2012), R66.

[15] Jesse C. Traller et al. Genome and methylome of the

oleaginous diatom Cyclotella cryptica reveal genetic

flexibility toward a high lipid phenotype. Biotechnol-

ogy for Biofuels 9.1 (2016), p. 258.

[16] Chris Bowler et al. The Phaeodactylum genome re-

veals the evolutionary history of diatom genomes.

Nature 456 (2008), pp. 239–244.

[17] Andrew E Allen, Assaf Vardi, and Chris Bowler.

An ecological and evolutionary context for integrated

nitrogen metabolism and related signaling pathways

in marine diatoms. Current Opinion in Plant Biology 9.3

(2006), pp. 264–273.

[18] Andrew E. Allen et al. Evolution and metabolic sig-

nificance of the urea cycle in photosynthetic diatoms.

Nature 473 (2011), pp. 203–207.

163

164 bibliography

[19] © Christina Brodie. Geometry and Pattern in Nature 1:

Exploring the shapes of diatom frustules with Johan Gielis’

Superformula. 2004. url: http://www.microscopy-uk.

org.uk/mag/indexmag.html?http://www.microscopy-

uk.org.uk/mag/artapr04/cbdiatom2.html (accessed

on 06/16/2018).

[20] Christian E. Hamm, Rudolf Merkel, Olaf

Springer, Piotr Jurkojc, Christian Maier, Kathrin

Prechtel, and Victor Smetacek. Architecture and

material properties of diatom shells provide effec-

tive mechanical protection. Nature 421.6925 (2003),

pp. 841–843.

[21] Christian E. Hamm. The Evolution of Advanced Me-

chanical Defenses and Potential Technological Appli-

cations of Diatom Shells. Journal of Nanoscience and

Nanotechnology 5.1 (2005), pp. 108–119.

[22] Zachary H. Aitken, Shi Luo, Stephanie N.

Reynolds, Christian Thaulow, and Julia R. Greer.

Microstructure provides insights into evolutionary

design and resilience of Coscinodiscus sp. frustule.

Proceedings of the National Academy of Sciences 113.8

(2016), pp. 2017–2022.

[23] John A. Raven. The Transport and Function of Sili-

con In Plants. Biological Reviews 58.2 (1983), pp. 179–

207.

[24] Katharine R. Hendry, Alan O. Marron, Flora

Vincent, Daniel J. Conley, Marion Gehlen, Fed-

erico M. Ibarbalz, Bernard Quéguiner, and Chris

Bowler. Competition between Silicifiers and Non-

silicifiers in the Past and Present Ocean and Its Evo-

lutionary Impacts. Frontiers in Marine Science 5 (2018).

[25] Jane Bradbury. Nature's Nanotechnologists: Unveil-

ing the Secrets of Diatoms. PLoS Biology 2.10 (2004),

pp. 1512–1515.

[26] Michael Gross. The mysteries of the diatoms. Cur-

rent Biology 22.15 (2012), pp. 581–585.

[27] E. Epstein. The anomaly of silicon in plant biol-

ogy. Proceedings of the National Academy of Sciences 91.1

(1994), pp. 11–17.

[28] Emanuel Epstein. SILICON. Annual Review of Plant

Physiology and Plant Molecular Biology 50.1 (1999),

pp. 641–664.

[29] Richard Gordon and Ryan W. Drum. The Chemi-

cal Basis of Diatom Morphogenesis. Mechanical En-

gineering of the Cytoskeleton in Developmental Biology.

Ed. by Richard Gordon. Vol. 150. International Re-

view of Cytology Supplement C. Academic Press, 1994,

pp. 243 –372.

[30] Cheryl Wong Po Foo, Jia Huang, and David L. Ka-

plan. Lessons from seashells: silica mineralization

via protein templating. Trends in Biotechnology 22.11

(2004), pp. 577 –585.

[31] Manfred Sumper and Eike Brunner. Learning from

Diatoms: Nature’s Tools for the Production of Nanos-

tructured Silica. Advanced Functional Materials 16.1

(2006), pp. 17–26.

[32] Nils Kröger and Nicole Poulsen. Biochemistry and

Molecular Genetics of Silica Biomineralization in Di-

atoms. Handbook of Biomineralization. Wiley-VCH Ver-

lag GmbH, 2008. Chap. 3, pp. 43–58.

[33] Nils Kröger and Nicole Poulsen. Diatoms—From

Cell Wall Biogenesis to Nanotechnology. Annual Re-

view of Genetics 42.1 (2008), pp. 83–107.

[34] Nils Kröger and Kenneth H. Sandhage. From

Diatom Biomolecules to Bioinspired Syntheses of

Silica- and Titania-Based Materials. MRS Bulletin 35.2

(2010), 122–126.

[35] Richard Gordon, Dusan Losic, Mary Ann Tiffany,

Stephen S. Nagy, and Frithjof A. S. Sterrenburg.

The Glass Menagerie: diatoms for novel applica-

tions in nanotechnology. Trends in Biotechnology 27.2

(XXXX), pp. 116–127.

[36] Gloria Bueno, Oscar Deniz, Anibal Pedraza,

Jesús Ruiz-Santaquiteria, Jesús Salido, Gabriel

Cristóbal, María Borrego-Ramos, and Saúl

Blanco. Automated Diatom Classification (Part A):

Handcrafted Feature Approaches. Applied Sciences

7.12 (2017), p. 753.

[37] Ernst Heinrich Philipp August Haeckel. Kunstfor-

men der Natur. Leipzig und Wien :Verlag des Bibli-

ographischen Instituts, 1904.

[38] Victor A. Chepurnov, David G. Mann, Koen Sabbe,

and Wim Vyverman. Experimental Studies on Sex-

ual Reproduction in Diatoms. International Review of

Cytology. Elsevier, 2004, pp. 91–154.

bibliography 165

[39] P. Treguer, D. M. Nelson, A. J. Van Bennekom, D. J.

DeMaster, A. Leynaert, and B. Queguiner. The Sil-

ica Balance in the World Ocean: A Reestimate. Science

268.5209 (1995), pp. 375–379.

[40] Mark Hildebrand, Benjamin E. Volcani, Walter

Gassmann, and Julian I. Schroeder. A gene family

of silicon transporters. Nature 385 (1997), pp. 688–689.

[41] Alan O. Marron, Sarah Ratcliffe, Glen L.

Wheeler, Raymond E. Goldstein, Nicole King, Fab-

rice Not, Colomban de Vargas, and Daniel J.

Richter. The Evolution of Silicon Transport in Eu-

karyotes. Molecular Biology and Evolution 33.12 (2016),

pp. 3226–3248.

[42] Colleen A. Durkin, Julie A. Koester, Sara J. Ben-

der, and E. Virginia Armbrust. The evolution of sil-

icon transporters in diatoms. Journal of Phycology 52.5

(2016), pp. 716–731.

[43] Michael J. Knight, Laura Senior, Bethany Nanco-

las, Sarah Ratcliffe, and Paul Curnow. Direct ev-

idence of the molecular basis for biological silicon

transport. Nature Communications 7 (2016), pp. 1–11.

[44] Grazyna M. Durak, Alison R. Taylor, Charlotte

E. Walker, Ian Probert, Colomban de Vargas,

Stephane Audic, Declan Schroeder, Colin Brown-

lee, and Glen L. Wheeler. A role for diatom-like sil-

icon transporters in calcifying coccolithophores. Na-

ture Communications 7 (2016), p. 10543.

[45] Tracy L. Simpson and Benjamin E. Volcani. Sili-

con and Siliceous Structures in Biological Systems.

Springer New York, 1981.

[46] Daniel Otzen. The Role of Proteins in Biosilicifica-

tion. Scientifica 2012.867562 (2012), p. 22.

[47] Carolin C. Lechner and Christian F. W. Becker.

A sequence-function analysis of the silica precipitat-

ing silaffin R5 peptide. Journal of Peptide Science 20.2

(2014), pp. 152–158.

[48] H. Ehrlich and A. Witkowski. Biomineralization in

Diatoms: The Organic Templates. Biologically-Inspired

Systems. Springer Netherlands, 2015, pp. 39–58.

[49] Mark Hildebrand and Sarah J.L. Lerch. Diatom

silica biomineralization: Parallel development of ap-

proaches and understanding. Seminars in Cell & Devel-

opmental Biology 46.Supplement C (2015), pp. 27 –35.

[50] Mark Hildebrand, Sarah J. L. Lerch, and Roshan

P. Shrestha. Understanding Diatom Cell Wall Silici-

fication—Moving Forward. Frontiers in Marine Science

5 (2018).

[51] Tadashi Nakajima and Benjamin E. Volcani. 3,4-

Dihydroxyproline: A New Amino Acid in Diatom

Cell Walls. Science 164.3886 (1969), pp. 1400–1401.

[52] Tadashi Nakajima and Benjamin E. Volcani. ε-N-

trimethyl-L-δ-hydroxylysine phosphate and its non-

phosphorylated compound in diatom cell walls. Bio-

chemical and Biophysical Research Communications 39.1

(1970), pp. 28 –33.

[53] Nils Kröger, Christian Bergsdorf, and Manfred

Sumper. A new calcium binding glycoprotein family

constitutes a major diatom cell wall component. The

EMBO Journal 13.19 (1994), pp. 4676–4683.

[54] Nils Kröger, Christian Bergsdorf, and Manfred

Sumper. Frustulins: Domain Conservation in a Pro-

tein Family Associated with Diatom Cell Walls. Euro-

pean Journal of Biochemistry 239.2 (1996), pp. 259–264.

[55] Nils Kröger and Richard Wetherbee. Pleuralins

are Involved in Theca Differentiation in the Diatom

Cylindrotheca fusiformis. Protist 151.3 (2000), pp. 263

–273.

[56] Willem H. van de Poll, Engel G. Vrieling, and

Winfried W. C. Gieskes. Location and Expression

of Frustulins in the Pennate Diatoms Cylindrotheca

fusiformis, Navicula pelliculosa, and Navicula sali-

narum (Bacillariophyceae). Journal of Phycology 35.5

(1999), pp. 1044–1053.

[57] Nils Kröger, Rainer Deutzmann, Christian

Bergsdorf, and Manfred Sumper. Species-specific

Polyamines from Diatoms Control Silica Morphol-

ogy. Proceedings of the National Academy of Sciences

97.26 (2000), pp. 14133–14138.

[58] Andrew J. Mort and Derek T.A. Lamport. An-

hydrous hydrogen fluoride deglycosylates glycopro-

teins. Analytical Biochemistry 82.2 (1977), pp. 289 –309.

[59] A. J. Mort, P. Komalavilas, G. L. Rorrer, and D. T. A.

Lamport. Anhydrous Hydrogen Fluoride and Cell-

Wall Analysis. Plant Fibers. Ed. by Hans-Ferdinand

Linskens and John F. Jackson. Berlin, Heidelberg:

Springer Berlin Heidelberg, 1989, pp. 37–69.

166 bibliography

[60] Manfred Sumper, Eike Brunner, and Gerhard

Lehmann. Biomineralization in diatoms: Character-

ization of novel polyamines associated with silica.

FEBS Letters 579.17 (2005), pp. 3765–3769.

[61] Manfred Sumper and Gerhard Lehmann. Sil-

ica Pattern Formation in Diatoms: Species-Specific

Polyamine Biosynthesis. ChemBioChem 7.9 (2006),

pp. 1419–1427.

[62] Manfred Sumper and Nils Kröger. Silica formation

in diatoms: the function of long-chain polyamines

and silaffins. J. Mater. Chem. 14 (14 2004), pp. 2059–

2065.

[63] Manfred Sumper. Biomimetic patterning of silica

by long-chain polyamines. Angewandte Chemie Inter-

national Edition 43.17 (2004), pp. 2251–2254.

[64] Nils Kröger, Rainer Deutzmann, and Manfred

Sumper. Polycationic Peptides from Diatom Biosil-

ica That Direct Silica Nanosphere Formation. Science

286.5442 (1999), pp. 1129–1132.

[65] Nils Kröger, Rainer Deutzmann, and Manfred

Sumper. Silica-precipitating Peptides from Diatoms:

the chemical structure of silaffin-1A from Cylin-

drotheca fusiformis. Journal of Biological Chemistry

276.28 (2001), pp. 26066–26070.

[66] Nils Kröger, Sonja Lorenz, Eike Brunner, and

Manfred Sumper. Self-Assembly of Highly Phos-

phorylated Silaffins and Their Function in Biosilica

Morphogenesis. Science 298.5593 (2002), pp. 584–586.

[67] Nicole Poulsen and Nils Kröger. Silica Morpho-

genesis by Alternative Processing of Silaffins in the

Diatom Thalassiosira pseudonana. Journal of Biological

Chemistry 279.41 (2004), pp. 42993–42999.

[68] Manfred Sumper, Robert Hett, Gerhard Lehmann,

and Stephan Wenzl. A Code for Lysine Modifica-

tions of a Silica Biomineralizing Silaffin Protein.

Angewandte Chemie 119.44 (2007), pp. 8557–8560.

[69] Manfred Sumper and Eike Brunner. Silica Biomin-

eralisation in Diatoms: The Model Organism Thalas-

siosira pseudonana. ChemBioChem 9.8 (2008), pp. 1187–

1194.

[70] H. Nielsen, J. Engelbrecht, S. Brunak, and G. von

Heijne. Identification of prokaryotic and eukaryotic

signal peptides and prediction of their cleavage sites.

Protein Engineering Design and Selection 10.1 (1997),

pp. 1–6.

[71] André Scheffel, Nicole Poulsen, Samuel Shian,

and Nils Kröger. Nanopatterned protein microrings

from a diatom that direct silica morphogenesis. Pro-

ceedings of the National Academy of Sciences 108.8 (2011),

pp. 3175–3180.

[72] Stephan Wenzl, Robert Hett, Patrick Richtham-

mer, and Manfred Sumper. Silacidins: Highly

Acidic Phosphopeptides from Diatom Shells Assist

in Silica Precipitation In Vitro. Angewandte Chemie In-

ternational Edition 47.9 (2008), pp. 1729–1732.

[73] Patrick Richthammer, Mandy Börmel, Eike Brun-

ner, and Karl-Heinz van Pée. Biomineralization in

Diatoms: The Role of Silacidins. ChemBioChem 12.9

(2011), pp. 1362–1366.

[74] Amy R Kirkham et al. A role for the cell-wall pro-

tein silacidin in cell size of the diatom Thalassiosira

pseudonana. The ISME Journal 11.11 (2017), pp. 2452–

2464.

[75] Christian Zerfaß, Garry W. Buchko, Wendy J.

Shaw, Stephan Hobe, and Harald Paulsen. Sec-

ondary structure and dynamics study of the intrin-

sically disordered silica-mineralizing peptide P5 S3

during silicic acid condensation and silica deconden-

sation. Proteins: Structure, Function, and Bioinformatics

85.11 (2017), pp. 2111–2126.

[76] Manfred Sumper. A Phase Separation Model for the

Nanopatterning of Diatom Biosilica. Science 295.5564

(2002), pp. 2430–2433.

[77] Nils Kröger and Manfred Sumper. The Biochem-

istry of Silica Formation in Diatoms. Ed. by Edmund

Bäuerlein. 2nd edition. Wiley-VCH, Weinheim, 2004.

Chap. 9, pp. 137–158.

[78] Manfred Sumper, Sonja Lorenz, and Eike Brun-

ner. Biomimetic Control of Size in the Polyamine-

Directed Formation of Silica Nanospheres. Ange-

wandte Chemie International Edition 42.42 (2003),

pp. 5192–5195.

[79] Ruedi Aebersold and Matthias Mann. Mass

spectrometry-based proteomics. Nature 422.6928

(2003), pp. 198–207.

[80] Matthias Mann and Ole N. Jensen. Proteomic

analysis of post-translational modifications. Nature

Biotechnology 21.3 (2003), pp. 255–261.

[81] Ole N. Jensen. Interpreting the protein language us-

ing proteomics. Nature Reviews Molecular Cell Biology 7

(2006), pp. 391–403.

bibliography 167

[82] Yingming Zhao and Ole N. Jensen. Modification-

specific proteomics: Strategies for characterization

of post-translational modifications using enrichment

techniques. PROTEOMICS 9.20 (2009), pp. 4632–4641.

[83] Nicole Poulsen, Manfred Sumper, and Nils

Kröger. Biosilica formation in diatoms: Characteri-

zation of native silaffin-2 and its role in silica mor-

phogenesis. Proceedings of the National Academy of Sci-

ences 100.21 (2003), pp. 12075–12080.

[84] Albert S.B. Edge, Connie R. Faltynek, Liselotte

Hof, Leo E. Reichert, and Peter Weber. Deglycosy-

lation of glycoproteins by trifluoromethanesulfonic

acid. Analytical Biochemistry 118.1 (1981), pp. 131–137.

[85] Albert S. B. Edge. Deglycosylation of glycoproteins

with trifluoromethanesulphonic acid: elucidation of

molecular structure and function. Biochemical Journal

376.2 (2003), pp. 339–350.

[86] Kevin P. Campbell, David H. MacLennan, and An-

nelise O. Jorgensen. Staining of the Ca2+-binding

proteins, calsequestrin, calmodulin, troponin C, and

S-100, with the cationic carbocyanine dye “Stains-

all.” Journal of Biological Chemistry 258.18 (1983),

pp. 11267–73.

[87] Jody M. Myers, Arthur Veis, Boris Sabsay, and A.P.

Wheeler. A Method for Enhancing the Sensitivity

and Stability of Stains-All for Phosphoproteins Sep-

arated in Sodium Dodecyl Sulfate-Polyacrylamide

Gels. Analytical Biochemistry 240.2 (1996), pp. 300 –302.

[88] Vonda Sheppard, Nicole Poulsen, and Nils Kröger.

Characterization of an Endoplasmic Reticulum-

associated Silaffin Kinase from the Diatom Thalas-

siosira pseudonana. Journal of Biological Chemistry 285.2

(2010), pp. 1166–1176.

[89] John R. Griffiths and Richard D. Unwin, eds. Anal-

ysis of Protein Post-Translational Modifications by

Mass Spectrometry. John Wiley & Sons, Inc., 2016.

[90] Alexander Kotzsch, Damian Pawolski, Alexan-

der Milentyev, Anna Shevchenko, André Schef-

fel, Nicole Poulsen, Andrej Shevchenko, and Nils

Kröger. Biochemical Composition and Assembly of

Biosilica-associated Insoluble Organic Matrices from

the Diatom Thalassiosira pseudonana. Journal of Bio-

logical Chemistry 291.10 (2016), pp. 4982–4997.

[91] Satoko Matsunaga, Ryuichi Sakai, Mitsuru

Jimbo, and Hisao Kamiya. Long-Chain Polyamines

(LCPAs) from Marine Sponge: Possible Implica-

tion in Spicule Formation. ChemBioChem 8.14 (2007),

pp. 1729–1735.

[92] Satoko Matsunaga, Mitsuru Jimbo, Martin B.

Gill, L. Leanne Lash-Van Wyhe, Michio Mu-

rata, Ken'ichi Nonomura, Geoffrey T. Swanson,

and Ryuichi Sakai. Isolation, Amino Acid Se-

quence and Biological Activities of Novel Long-

Chain Polyamine-Associated Peptide Toxins from

the Sponge Axinyssa aculeata. ChemBioChem 12.14

(2011), pp. 2191–2200.

[93] Myung Hee Park. The Post-Translational Synthe-

sis of a Polyamine-Derived Amino Acid, Hypu-

sine, in the Eukaryotic Translation Initiation Factor

5A (eIF5A). The Journal of Biochemistry 139.2 (2006),

pp. 161–169.

[94] E. C. Wolff, K. R. Kang, Y. S. Kim, and M. H. Park.

Posttranslational synthesis of hypusine: evolution-

ary progression and specificity of the hypusine mod-

ification. Amino Acids 33.2 (2007), pp. 341–350.

[95] Anthony E. Pegg and Jr. Robert A. Casero, eds.

Polyamines. Humana Press, 2011.

[96] Tomonobu Kusano and Hideyuki Suzuki, eds.

Polyamines. Springer Japan, 2015.

[97] Jürgen M. Knott, Piero Römer, and Manfred

Sumper. Putative spermine synthases from Thalas-

siosira pseudonana and Arabidopsis thaliana synthe-

size thermospermine rather than spermine. FEBS Let-

ters 581.16 (2007), pp. 3081–3086.

[98] Piero Römer, A. Faltermeier, V. Mertins, T.

Gedrange, R. Mai, and P. Proff. Investigations about

N-aminopropyl transferases probably involved in

biomineralization. J. Physiol. Pharmacol. 59 Suppl 5

(2008), pp. 27–37.

[99] Anthony J. Michael. Molecular machines encoded

by bacterially-derived multi-domain gene fusions

that potentially synthesize, N-methylate and trans-

fer long chain polyamines in diatoms. FEBS Letters

585.17 (2011), pp. 2627 –2634.

[100] Paul Lasko. Tudor Domain. Current Biology 20.16

(XXXX), R666–R667.

168 bibliography

[101] Carolin C. Lechner and Christian F. W. Becker.

Silaffins in Silica Biomineralization and Biomimetic

Silica Precipitation. Marine Drugs 13.8 (2015),

pp. 5297–5333.

[102] Stephan Wenzl, Rainer Deutzmann, Robert Hett,

Eduard Hochmuth, and Manfred Sumper. Quater-

nary Ammonium Groups in Silica-Associated Pro-

teins. Angewandte Chemie International Edition 43.44

(2004), pp. 5933–5936.

[103] Luciano G. Frigeri, Timothy R. Radabaugh, Paul

A. Haynes, and Mark Hildebrand. Identification of

Proteins from a Cell Wall Fraction of the Diatom Tha-

lassiosira pseudonana : Insights into Silica Structure

Formation. Molecular & Cellular Proteomics 5.1 (2006),

pp. 182–193.

[104] Thomas Mock et al. Whole-genome expression profil-

ing of the marine diatom Thalassiosira pseudonana

identifies genes involved in silicon bioprocesses. Pro-

ceedings of the National Academy of Sciences 105.5 (2008),

pp. 1579–1584.

[105] Ziyad Tariq Muhseen, Qian Xiong, Zhuo Chen,

and Feng Ge. Proteomics studies on stress responses

in diatoms. PROTEOMICS 15.23-24 (2015), pp. 3943–

3953.

[106] Tore Brembu, Matilde Skogen Chauton, Per Winge,

Atle M. Bones, and Olav Vadstein. Dynamic re-

sponses to silicon in Thalasiossira pseudonana -

Identification, characterisation and classification of

signature genes and their corresponding protein mo-

tifs. Scientific Reports 7.1 (2017), p. 4865.

[107] Johan Stenflo, Per Fernlund, William Egan, and

Peter Roepstorff. Vitamin K Dependent Modifica-

tions of Glutamic Acid Residues in Prothrombin. Pro-

ceedings of the National Academy of Sciences 71.7 (1974),

pp. 2730–2733.

[108] Annie Moradian, Anastasia Kalli, Michael J.

Sweredoski, and Sonja Hess. The top-down, middle-

down, and bottom-up mass spectrometry approaches

for characterization of histone variants and their

post-translational modifications. PROTEOMICS 14.4-

5 (2013), pp. 489–497.

[109] Yaoyang Zhang, Bryan R. Fonslow, Bing Shan,

Moon-Chang Baek, and John R. Yates. Protein Anal-

ysis by Shotgun/Bottom-up Proteomics. Chemical Re-

views 113.4 (2013), pp. 2343–2394.

[110] Andrej Shevchenko, Matthias Wilm, Ole Vorm,

and Matthias Mann. Mass Spectrometric Sequenc-

ing of Proteins from Silver-Stained Polyacrylamide

Gels. Analytical Chemistry 68.5 (1996), pp. 850–858.

[111] Andrej Shevchenko, Henrik Tomas, Jan Havliš, Jes-

per V. Olsen, and Matthias Mann. In-gel digestion

for mass spectrometric characterization of proteins

and proteomes. Nat. Protocols 1.6 (2007), pp. 2856–

2860.

[112] B. T. Chait. Mass Spectrometry: Bottom-Up or Top-

Down? Science 314.5796 (2006), pp. 65–66.

[113] Michael Fountoulakis and Hans-Werner Lahm.

Hydrolysis and amino acid composition analysis

of proteins. Journal of Chromatography A 826.2 (1998),

pp. 109 –134.

[114] Shane M. Rutherfurd and G. Sarwar Gilani.

Amino Acid Analysis. Current Protocols in Protein Sci-

ence. John Wiley & Sons, Inc., 2001.

[115] Merja R. Häkkinen, Tuomo A. Keinänen, Jouko

Vepsäläinen, Alex R. Khomutov, Leena Alhonen,

Juhani Jänne, and Seppo Auriola. Analysis of

underivatized polyamines by reversed phase liq-

uid chromatography with electrospray tandem mass

spectrometry. Journal of Pharmaceutical and Biomedical

Analysis 45.4 (2007), pp. 625 –634.

[116] Gottfried J. Feistner. Profiling of basic amino acids

and polyamines in microbial culture supernatants by

electrospray mass spectrometry. Biological Mass Spec-

trometry 23.12 (1994), pp. 784–792.

[117] P. Fürst, L. Pollack, T.A. Graser, H. Godel, and P.

Stehle. Appraisal of four pre-column derivatization

methods for the high-performance liquid chromato-

graphic determination of free amino acids in biolog-

ical materials. Journal of Chromatography A 499 (1990),

pp. 557–569.

[118] Durk Fekkes. State-of-the-art of high-performance

liquid chromatographic analysis of amino acids in

physiological samples. Journal of Chromatography B:

Biomedical Sciences and Applications 682.1 (1996), pp. 3–

22.

[119] G. McClung and W. T. Frankenberger. Comparison

of Reverse-Phase High-Performance Liquid Chro-

matographic Methods for Precolumn-Derivatized

Amino Acids. Journal of Liquid Chromatography 11.3

(1988), pp. 613–646.

bibliography 169

[120] Karin Gartenmann and Sunil Kochhar. Short-

Chain Peptide Analysis by High-Performance Liq-

uid Chromatography Coupled to Electrospray Ion-

ization Mass Spectrometer after Derivatization with

9-Fluorenylmethyl Chloroformate. Journal of Agricul-

tural and Food Chemistry 47.12 (1999), pp. 5068–5071.

[121] Hans M.H. van Eijk, Dennis R. Rooyakkers, Peter B.

Soeters, and Nicolaas E.P. Deutz. Determination of

Amino Acid Isotope Enrichment Using Liquid Chro-

matography–Mass Spectrometry. Analytical Biochem-

istry 271.1 (1999), pp. 8–17.

[122] Steven A. Cohen. Amino Acid Analysis Using

Precolumn Derivatization with 6-Aminoquinolyl-N-

Hydroxysuccinimidyl Carbamate. Amino Acid Anal-

ysis Protocols. Ed. by Catherine Cooper, Nicolle

Packer, and Keith Williams. Totowa, NJ: Humana

Press, 2000, pp. 39–47.

[123] Y Mengerink, D Kutlán, F Tóth, A Csámpai,

and I Molnár-Perl. Advances in the evaluation

of the stability and characteristics of the amino

acid and amine derivatives obtained with the o-

phthaldialdehyde/3-mercaptopropionic acid and o-

phthaldialdehyde/N-acetyl-l-cysteine reagents. Jour-

nal of Chromatography A 949.1-2 (2002), pp. 99–124.

[124] Roland J.W. Meesters, Robert R. Wolfe,

and Nicolaas E.P. Deutz. Application of liq-

uid chromatography-tandem mass spectrometry

(LC–MS/MS) for the analysis of stable isotope en-

richments of phenylalanine and tyrosine. Journal of

Chromatography B 877.1-2 (2009), pp. 43–49.

[125] © IUPAC. RECOMMENDATIONS. 2018. url: https:

/ / iupac . org / what - we - do / recommendations/ (ac-

cessed on 02/13/2018).

[126] Thomas Weiss, Günther Bernhardt, Armin

Buschauer, Karl-Walter Jauch, and Hubert

Zirngibl. High-Resolution Reversed-Phase High-

Performance Liquid Chromatography Analysis of

Polyamines and Their Monoacetyl Conjugates by

Fluorescence Detection after Derivatization withN-

Hydroxysuccinimidyl 6-Quinolinyl Carbamate. Ana-

lytical Biochemistry 247.2 (1997), pp. 294–304.

[127] Steven A. Cohen and Dennis P. Michaud. Syn-

thesis of a Fluorescent Derivatizing Reagent, 6-

Aminoquinolyl-N-Hydroxysuccinimidyl Carbamate,

and Its Application for the Analysis of Hy-

drolysate Amino Acids via High-Performance Liquid

Chromatography. Analytical Biochemistry 211.2 (1993),

pp. 279 –287.

[128] Ji Liu Hong. Determination of amino acids by

precolumn derivatization with 6-aminoquinolyl-N-

hydroxysuccinimidyl carbamate and high perfor-

mance liquid chromatography with ultraviolet detec-

tion. Journal of Chromatography A 670.1-2 (1994), pp. 59–

66.

[129] Thomas S. Weiss. HPLC of Biogenic Amines as 6-

Aminoquinolyl-N-hydroxysuccinimidyl Derivatives.

Journal of Chromatography Library 70 (2005), pp. 502 –

523.

[130] Jenny M. Armenta, Diego F. Cortes, John M. Pis-

ciotta, Joel L. Shuman, Kenneth Blakeslee, Do-

minique Rasoloson, Oluwatosin Ogunbiyi, David

J. Sullivan, and Vladimir Shulaev. Sensitive and

Rapid Method for Amino Acid Quantitation in

Malaria Biological Samples Using AccQ•Tag Ultra

Performance Liquid Chromatography-Electrospray

Ionization-MS/MS with Multiple Reaction Monitor-

ing. Analytical Chemistry 82.2 (2010), pp. 548–558.

[131] Carolina Salazar, Jenny M. Armenta, and

Vladimir Shulaev. An UPLC-ESI-MS/MS Assay Us-

ing 6-Aminoquinolyl-N-Hydroxysuccinimidyl Car-

bamate Derivatization for Targeted Amino Acid

Analysis: Application to Screening of Arabidopsis

thaliana Mutants. Metabolites 2.3 (2012), pp. 398–428.

[132] Ran Liu, Kaishun Bi, Ying Jia, Qian Wang, Ran Yin,

and Qing Li. Determination of polyamines in human

plasma by high-performance liquid chromatography

coupled with Q-TOF mass spectrometry. Journal of

Mass Spectrometry 47.10 (2012), pp. 1341–1346.

[133] Christoph Magnes, Alexander Fauland, Edgar

Gander, Sophie Narath, Maria Ratzer, Tobias

Eisenberg, Frank Madeo, Thomas Pieber, and

Frank Sinner. Polyamines in biological samples:

Rapid and robust quantification by solid-phase

extraction online-coupled to liquid chromatogra-

phy–tandem mass spectrometry. Journal of Chromatog-

raphy A 1331 (2014), pp. 44 –51.

[134] Hidehiro Nakamura, Sachise Karakawa, Akiko

Watanabe, Yasuko Kawamata, Tomomi Kuwahara,

Kazutaka Shimbo, and Ryosei Sakai. Measurement

of 15N enrichment of glutamine and urea cycle

amino acids derivatized with 6-aminoquinolyl-N-

hydroxysuccinimidyl carbamate using liquid chro-

170 bibliography

matography–tandem quadrupole mass spectrometry.

Analytical Biochemistry 476 (2015), pp. 67 –77.

[135] JB Fenn, M Mann, CK Meng, SF Wong, and CM

Whitehouse. Electrospray ionization for mass spec-

trometry of large biomolecules. Science 246.4926

(1989), pp. 64–71.

[136] Jesper V. Olsen et al. A Dual Pressure Linear Ion

Trap Orbitrap Instrument with Very High Sequenc-

ing Speed. Molecular & Cellular Proteomics 8.12 (2009),

pp. 2759–2769.

[137] Jae C. Schwartz, Michael W. Senko, and John E. P.

Syka. A two-dimensional quadrupole ion trap mass

spectrometer. Journal of the American Society for Mass

Spectrometry 13.6 (2002), pp. 659–669.

[138] Chien-Wen Hung, Andreas Schlosser, Junhua Wei,

and Wolf D. Lehmann. Collision-induced reporter

fragmentations for identification of covalently mod-

ified peptides. Analytical and Bioanalytical Chemistry

389.4 (2007), pp. 1003–1016.

[139] P. Roepstorff and J. Fohlman. Proposal for a com-

mon nomenclature for sequence ions in mass spectra

of peptides. Biological Mass Spectrometry 11.11 (1984),

pp. 601–601.

[140] © Matrix Science. Peptide fragmentation. 2016. url:

http : / / www . matrixscience . com / help /

fragmentation_help.html (accessed on 04/30/2018).

[141] Hanno Steen and Matthias Mann. The ABC’s (and

XYZ’s) of peptide sequencing. Nature Reviews Molecu-

lar Cell Biology 5 (2004), 699 EP.

[142] Andreas Schlosser and Wolf D. Lehmann. Five-

membered ring formation in unimolecular reactions

of peptides: a key structural element controlling low-

energy collision-induced dissociation of peptides.

Journal of Mass Spectrometry 35.12 (2000), pp. 1382–

1390.

[143] Eric S Witze, William M Old, Katheryn A

Resing, and Natalie G Ahn. Mapping protein post-

translational modifications with mass spectrometry.

Nature Methods 4.10 (2007), pp. 798–806.

[144] Erik Ahrné, Markus Müller, and Frederique

Lisacek. Unrestricted identification of modified

proteins using MS/MS. PROTEOMICS 10.4 (2010),

pp. 671–686.

[145] Rovshan G Sadygov, Daniel Cociorva, and John R

Yates. Large-scale database searching using tandem

mass spectra: Looking up the answer in the back of

the book. Nature Methods 1.3 (2004), pp. 195–202.

[146] Alexey I. Nesvizhskii. Protein Identification by Tan-

dem Mass Spectrometry and Sequence Database

Searching. Mass Spectrometry Data Analysis in Proteo-

mics. Ed. by Rune Matthiesen. Totowa, NJ: Humana

Press, 2007, pp. 87–119.

[147] Jens Allmer. Algorithms for the de novo sequencing

of peptides from tandem mass spectra. Expert Review

of Proteomics 8.5 (2011), pp. 645–657.

[148] Matthias Mann, Chin Kai Meng, and John B. Fenn.

Interpreting mass spectra of multiply charged ions.

Analytical Chemistry 61.15 (1989), pp. 1702–1708.

[149] Marc Gentzel, Thomas Köcher, Saravanan Pon-

nusamy, and Matthias Wilm. Preprocessing of

tandem mass spectrometric data to support auto-

matic protein identification. PROTEOMICS 3.8 (2003),

pp. 1597–1610.

[150] Nedim Mujezinovic, Günther Raidl, James R. A.

Hutchins, Jan-Michael Peters, Karl Mechtler,

and Frank Eisenhaber. Cleaning of raw peptide

MS/MS spectra: Improved protein identification fol-

lowing deconvolution of multiply charged peaks, iso-

tope clusters, and removal of background noise. PRO-

TEOMICS 6.19 (2006), pp. 5117–5131.

[151] Ingvar Eidhammer, Kristian Flikka, Lennart

Martens, and Svein-Ole Mikalsen. Tandem MS or

MS/MS Analysis. Computational Methods for Mass Spec-

trometry Proteomics. John Wiley & Sons, Ltd, 2007.

Chap. 8, pp. 119–140.

[152] © Matrix Science. Modifications. 2016. url: http://

www.matrixscience.com/help/pt_mods_help.html

(accessed on 02/07/2018).

[153] Thomas Burger. Gentle Introduction to the Statisti-

cal Foundations of False Discovery Rate in Quantita-

tive Proteomics. Journal of Proteome Research 17.1 (2017),

pp. 12–22.

[154] Mikhail M. Savitski, Simone Lemeer, Markus

Boesche, Manja Lang, Toby Mathieson, Marcus

Bantscheff, and Bernhard Kuster. Confident Phos-

phorylation Site Localization Using the Mascot Delta

Score. Molecular & Cellular Proteomics 10.2 (2010),

p. M110.003830.

bibliography 171

[155] Jesper V. Olsen, Blagoy Blagoev, Florian Gnad,

Boris Macek, Chanchal Kumar, Peter Mortensen,

and Matthias Mann. Global, In Vivo, and Site-

Specific Phosphorylation Dynamics in Signaling

Networks. Cell 127.3 (2006), pp. 635–648.

[156] Andrew J. Alverson, Bánk Beszteri, Matthew L.

Julius, and Edward C. Theriot. The model marine

diatom Thalassiosira pseudonana likely descended

from a freshwater ancestor in the genus Cyclotella.

BMC Evolutionary Biology 11.1 (2011), p. 125.

[157] Andrew J. Alverson, Robert K. Jansen, and Edward

C. Theriot. Bridging the Rubicon: Phylogenetic anal-

ysis reveals repeated colonizations of marine and

fresh waters by thalassiosiroid diatoms. Molecular

Phylogenetics and Evolution 45.1 (2007), pp. 193 –210.

[158] William M. McGee and Scott A. McLuckey. The or-

nithine effect in peptide cation dissociation. Journal

of Mass Spectrometry 48.7 (2013), pp. 856–861.

[159] Kangling Zhang, Peter M. Yau, Bhaskar Chan-

drasekhar, Ron New, Richard Kondrat, Brian

S. Imai, and Morton E. Bradbury. Differentia-

tion between peptides containing acetylated or tri-

methylated lysines by mass spectrometry: An ap-

plication for determining lysine 9 acetylation and

methylation of histone H3. PROTEOMICS 4.1 (2004),

pp. 1–10.

[160] Timothy A. Couttas, Mark J. Raftery, Giulia

Bernardini, and Marc R. Wilkins. Immonium Ion

Scanning for the Discovery of Post-Translational

Modifications and Its Application to Histones. Jour-

nal of Proteome Research 7.7 (2008), pp. 2632–2641.

[161] Morten B. Trelle, and Ole N. Jensen. Utility

of Immonium Ions for Assignment of epsilon-

N-Acetyllysine-Containing Peptides by Tandem

Mass Spectrometry. Analytical Chemistry 80.9 (2008),

pp. 3422–3430.

[162] Olaf Kühl, ed. Phosphorus-31 NMR Spectroscopy:

A Concise Introduction for the Synthetic Organic

and Organometallic Chemist. Springer Berlin Heidel-

berg, 2009.

[163] Eike Brunner, Patrick Richthammer, Hermann

Ehrlich, Silvia Paasch, Paul Simon, Susanne Ue-

berlein, and Karl-Heinz van Pée. Chitin-Based

Organic Networks: An Integral Part of Cell Wall

Biosilica in the Diatom Thalassiosira pseudonana.

Angewandte Chemie International Edition 48.51 (2009),

pp. 9724–9727.

[164] Benoit Tesson and Mark Hildebrand. Characteri-

zation and Localization of Insoluble Organic Matri-

ces Associated with Diatom Cell Walls: Insight into

Their Roles during Cell Wall Formation. PLOS ONE

8.4 (2013), pp. 1–13.

[165] Aubrey K. Davis, Mark Hildebrand, and Brian

Palenik. A Stress-Induced Protein Associated With

The Girdle Band Region Of The Diatom Thalas-

siosira Pseudonana (Bacillariophyta). Journal of Phy-

cology 41.3 (2005), pp. 577–589.

[166] Alexander Kotzsch, Philip Gröger, Damian Pa-

wolski, Paul H. H. Bomans, Nico A. J. M. Som-

merdijk, Michael Schlierf, and Nils Kröger.

Silicanin-1 is a conserved diatom membrane protein

involved in silica biomineralization. BMC Biology 15.1

(2017), p. 65.

[167] A. J. Michael. Biosynthesis of polyamines and

polyamine-containing molecules. Biochemical Journal

473.15 (2016), pp. 2315–2329.

[168] Danielle L. Swaney, Craig D. Wenger, and Joshua J.

Coon. Value of Using Multiple Proteases for Large-

Scale Mass Spectrometry-Based Proteomics. Journal

of Proteome Research 9.3 (2010), pp. 1323–1329.

[169] M. J. MacCoss et al. Shotgun identification of pro-

tein modifications from protein complexes and lens

tissue. Proceedings of the National Academy of Sciences

99.12 (2002), pp. 7900–7905.

[170] Tao Xu, Catherine C L Wong, Anna Kashina, and

John R Yates. Identification of N-terminally arginy-

lated proteins and peptides by mass spectrometry.

Nature Protocols 4.3 (2009), pp. 325–332.

[171] Mukesh Kumar, Shai R. Joseph, Martina Augsburg,

Aliona Bogdanova, David Drechsel, Nadine L. Vas-

tenhouw, Frank Buchholz, Marc Gentzel, and An-

drej Shevchenko. MS Western, a Method of Multi-

plexed Absolute Protein Quantification is a Practical

Alternative to Western Blotting. Molecular & Cellular

Proteomics 17.2 (2017), pp. 384–396.

[172] © Abcam. Protein dephosphorylation protocol. 2018.

url: http : / / www . abcam . com / protocols /

protein- dephosphorylation- protocol (accessed on

02/16/2018).

[173] Carol A. Olson, Richard Krueger, and Nancy B.

Schwartz. Deglycosylation of chondroitin sulfate

proteoglycan by hydrogen fluoride in pyridine. An-

alytical Biochemistry 146.1 (1985), pp. 232 –237.

172 bibliography

[174] Hiroki Kuyama, Chikako Toda, Makoto Watanabe,

Koichi Tanaka, and Osamu Nishimura. An efficient

chemical method for dephosphorylation of phospho-

peptides. Rapid Communications in Mass Spectrometry

17.13 (2003), pp. 1493–1496.

[175] Eileen M. Woo, David Fenyo, Benjamin H. Kwok, Hi-

ronori Funabiki, and Brian T. Chait. Efficient Iden-

tification of Phosphorylation by Mass Spectrometric

Phosphopeptide Fingerprinting. Analytical Chemistry

80.7 (2008), pp. 2419–2425.

[176] Bin Ma and Richard Johnson. De Novo Sequencing

and Homology Searching. Molecular & Cellular Proteo-

mics 11.2 (2012).

[177] Matthias Mann and Matthias Wilm. Error-Tolerant

Identification of Peptides in Sequence Databases

by Peptide Sequence Tags. Analytical Chemistry 66.24

(1994), pp. 4390–4399.

[178] © Matrix Science. Error tolerant search. 2016. url:

http : / / www . matrixscience . com / help / error _

tolerant_help.html (accessed on 02/06/2018).

[179] Xin Huang et al. ISPTM: An Iterative Search

Algorithm for Systematic Identification of Post-

translational Modifications from Complex Proteome

Mixtures. Journal of Proteome Research 12.9 (2013),

pp. 3831–3842.

[180] Liana Tsiatsiani and Albert J. R. Heck. Proteomics

beyond trypsin. FEBS Journal 282.14 (2015), pp. 2612–

2626.

[181] Jesper V. Olsen, Shao-En Ong, and Matthias Mann.

Trypsin Cleaves Exclusively C-terminal to Arginine

and Lysine Residues. Molecular & Cellular Proteomics

3.6 (2004), pp. 608–614.

[182] Xue Jun. Tang, Pierre. Thibault, and Robert

K. Boyd. Fragmentation reactions of multiply-

protonated peptides and implications for sequenc-

ing by tandem mass spectrometry with low-energy

collision-induced dissociation. Analytical Chemistry

65.20 (1993), pp. 2824–2834.

[183] Vladimir Gorshkov, Thiago Verano-Braga, and

Frank Kjeldsen. SuperQuant: A Data Processing Ap-

proach to Increase Quantitative Proteome Coverage.

Analytical Chemistry 87.12 (2015), pp. 6319–6327.

[184] Ralph Wieneke, Anja Bernecker, Radostan Riedel,

Manfred Sumper, Claudia Steinem, and Armin

Geyer. Silica precipitation with synthetic silaffin

peptides. Org. Biomol. Chem. 9 (15 2011), pp. 5482–5486.

[185] Nicole Poulsen, André Scheffel, Vonda C. Shep-

pard, Patrick M. Chesley, and Nils Kröger. Pentaly-

sine Clusters Mediate Silica Targeting of Silaffins in

Thalassiosira pseudonana. Journal of Biological Chem-

istry 288.28 (2013), pp. 20100–20109.

[186] Tony Hunter. The Age of Crosstalk: Phosphoryla-

tion, Ubiquitination, and Beyond. Molecular Cell 28.5

(2007), pp. 730–738.

[187] Regev Schweiger and Michal Linial. Cooperativ-

ity within proximal phosphorylation sites is revealed

from large-scale proteomics data. Biology Direct 5.1

(2010), p. 6.

[188] Pablo Minguez, Luca Parca, Francesca Diella,

Daniel R Mende, Runjun Kumar, Manuela

Helmer-Citterich, Anne-Claude Gavin, Vera van

Noort, and Peer Bork. Deciphering a global net-

work of functionally associated post-translational

modifications. Molecular Systems Biology 8 (2012).

[189] Pedro Beltrao, Véronique Albanèse, Lillian R.

Kenner, Danielle L. Swaney, Alma Burlingame, Ju-

dit Villén, Wendell A. Lim, James S. Fraser, Ju-

dith Frydman, and Nevan J. Krogan. Systematic

Functional Prioritization of Protein Posttranslational

Modifications. Cell 150.2 (2012), pp. 413–425.

[190] Pablo Minguez, Ivica Letunic, Luca Parca, and

Peer Bork. PTMcode: a database of known and

predicted functional associations between post-

translational modifications in proteins. Nucleic Acids

Research 41.D1 (2012), pp. D306–D311.

[191] Mao Peng, Arjen Scholten, Albert J. R. Heck,

and Bas van Breukelen. Identification of Enriched

PTM Crosstalk Motifs from Large-Scale Experimen-

tal Data Sets. Journal of Proteome Research 13.1 (2013),

pp. 249–259.

[192] A. Saskia Venne, Laxmikanth Kollipara, and René

P. Zahedi. The next level of complexity: Crosstalk of

posttranslational modifications. PROTEOMICS 14.4-5

(2014), pp. 513–524.

[193] Veit Schwämmle, Claudia-Maria Aspalter, Simone

Sidoli, and Ole N. Jensen. Large Scale Analy-

sis of Co-existing Post-translational Modifications

in Histone Tails Reveals Global Fine Structure of

Cross-talk. Molecular & Cellular Proteomics 13.7 (2014),

pp. 1855–1865.

bibliography 173

[194] Yuanhua Huang, Bosen Xu, Xueya Zhou, Ying

Li, Ming Lu, Rui Jiang, and Tingting Li. Sys-

tematic Characterization and Prediction of Post-

Translational Modification Cross-Talk. Molecular &

Cellular Proteomics 14.3 (2015), pp. 761–770.

[195] Veit Schwämmle, Simone Sidoli, Chrystian Rumi-

nowicz, Xudong Wu, Chung-Fan Lee, Kristian He-

lin, and Ole N. Jensen. Systems Level Analysis of

Histone H3 Post-translational Modifications (PTMs)

Reveals Features of PTM Crosstalk in Chromatin

Regulation. Molecular & Cellular Proteomics 15.8 (2016),

pp. 2715–2729.

[196] Thomas D. Schneider and R.Michael Stephens. Se-

quence logos: a new way to display consensus se-

quences. Nucleic Acids Research 18.20 (1990), pp. 6097–

6100.

[197] Cong Wu, John C Tran, Leonid Zamdborg, Ken-

neth R Durbin, Mingxi Li, Dorothy R Ahlf, Bryan

P Early, Paul M Thomas, Jonathan V Sweedler,

and Neil L Kelleher. A protease for 'middle-down'

proteomics. Nature Methods 9.8 (2012), pp. 822–824.

[198] Pitter F. Huesgen, Philipp F. Lange, Lindsay D.

Rogers, Nestor Solis, Ulrich Eckhard, Oded

Kleifeld, Theodoros Goulas, F. Xavier Gomis-

Rüth, and Christopher M. Overall. LysargiNase

mirrors trypsin for protein C-terminal and

methylation-site identification. Nature Methods 12

(2014), pp. 55–58.

[199] Piero Giansanti, Liana Tsiatsiani, Teck Yew Low,

and Albert J R Heck. Six alternative proteases

for mass spectrometry–based proteomics beyond

trypsin. Nature Protocols 11.5 (2016), pp. 993–1006.

[200] Lloyd M Smith, and Neil L Kelleher. Proteoform:

a single term describing protein complexity. Nature

Methods 10.3 (2013), pp. 186–187.

[201] Lloyd M. Smith and Neil L. Kelleher. Proteoforms

as the next proteomics currency. Science 359.6380

(2018), pp. 1106–1107.

[202] Oyo Mitsunobu and Masaaki Yamada. Preparation

of Esters of Carboxylic and Phosphoric Acid via Qua-

ternary Phosphonium Salts. Bulletin of the Chemical So-

ciety of Japan 40.10 (1967), pp. 2380–2382.

[203] Canadian Centre for the Culture of Microorgan-

isms. HESNW/ESAW Recipe. 2018. url: http://cccm.

botany.ubc.ca/resources/marine-media-receipes/

hesnwesaw-recipe/ (accessed on 04/05/2018).

[204] Hermann Schägger. Tricine-SDS-PAGE. Nat. Proto-

cols 1.1 (2006), pp. 16–22.

[205] F. William Studier. Protein production by auto-

induction in high-density shaking cultures. Protein

Expression and Purification 41.1 (2005), pp. 207–234.

[206] Waters Corporation. AccQ·Fluor Reagent Kit care and

use manual. 2008. url: http : / / www . waters . com /

waters/download.htm?lid=10069610&id=10069609&

fileName=wat0052881&fileUrl=%2fwebassets%2fcms%

2fsupport % 2fdocs % 2fwat0052881 . pdf (accessed on

04/12/2018).

[207] ©Eidgenössische Technische Hochschule Func-

tional Genomics Center Zürich. Amino Acid Anal-

ysis. 2018. url: http://www.fgcz.ch/omics_areas/

prot/applications/protein- quantitation/amino-

acid-analysis.html (accessed on 02/14/2018).

[208] Sigma-Aldrich Co. GlycoProfile™ IV chemical deglyco-

sylation kit. 2004. url: https : / / www . sigmaaldrich .

com / content / dam / sigma - aldrich / docs / Sigma /

Bulletin/pp0510bul.pdf (accessed on 04/12/2018).

[209] David N. Perkins, Darryl J. C. Pappin, David

M. Creasy, and John S. Cottrell. Probability-

based protein identification by searching sequence

databases using mass spectrometry data. Electrophore-

sis 20.18 (1999), pp. 3551–3567.

[210] Andrew Keller, Alexey I. Nesvizhskii, Eugene

Kolker, and Ruedi Aebersold. Empirical Statistical

Model To Estimate the Accuracy of Peptide Identifi-

cations Made by MS/MS and Database Search. Ana-

lytical Chemistry 74.20 (2002), pp. 5383–5392.

[211] Alexey I. Nesvizhskii, Andrew Keller, Eugene

Kolker, and Ruedi Aebersold. A Statistical Model

for Identifying Proteins by Tandem Mass Spectrom-

etry. Analytical Chemistry 75.17 (2003), pp. 4646–4658.

[212] Juan Antonio Vizcaíno et al. The Proteomics Iden-

tifications (PRIDE) database and associated tools:

status in 2013. Nucleic Acids Research 41.D1 (2012),

pp. D1063–D1069.

[213] E. Beitz. TeXshade: shading and labeling of multi-

ple sequence alignments using LaTeX2e. Bioinformat-

ics 16.2 (2000), pp. 135–139.

A C K N O W L E D G M E N T S

I wish to thank, first and foremost, my supervisor Andrej Shevchenko for giving me

the opportunity to work in this fantastic project and guiding me through it. I consider

it an honour to have worked with the members of the Shevchenko Lab, who created

the best environment to do science. Indeed, this thesis is a result of many people’s hard

work and collaboration, and I acknowledge their tireless contributions to the body of

work for my doctoral thesis, as further summarized here.

I am indebted to many of my colleagues for their technical supports, especially Marc

Gentzel at Biotechnology Center (BIOTEC, Dresden), for handing with technique and

fruitful discussions. I would also like to thank Oskar Knittelfelder (Shevchenko

Lab) and Alastair Skeffington at MPI of Molecular Plant Physiology (Potsdam) for

critically reading the manuscript of this dissertation and improving its language.

This project could have never advanced so much without the invaluable contribu-

tions of our collaborators. Therefore, I hereby would like to express my gratitude

to Nils Kröger, Nicole Poulsen, Alexander Kotzsch, Christoph Heintze, and

Damian Pawolski at B CUBE Center for Molecular Bioengineering (Dresden). I also

thank Eike Brunner and Marcus Rauche at TU Dresden for help with NMR analysis,

and all members of Diatom Forschergruppe 2038 (nanomee.de).

Specially, I would like to thank my thesis advisory committee members Bernard

Hoflack and Gaia Pigino for their regular feedback on the progress of my research.

I am indebted to my friends Daria Ezerin, a at VIB-VUB Center for Structural Bi-

ology (Brussels) and Maxim Fomin at MPI for biophysical chemistry (Göttingen) for

their professional assistance and moral support.

Last but not the least, I thank my parents, my sister and my friends for everything

they have ever done for me so far.

The project is supported by the FOR 2038 ‘Nanopatterned Organic Matrices in Biological Silica

Mineralization’ awarded by Deutsche Forschungsgemeinschaft (DFG).

175

P U B L I C AT I O N S

The following papers originated from this thesis work:

• Alexander Kotzsch, Damian Pawolski, Alexander Milentyev, Anna Shevchenko,

André Scheffel, Nicole Poulsen, Andrej Shevchenko, Nils Kröger Biochemical

Composition and Assembly of Biosilica-associated Insoluble Organic Matrices

from the Diatom Thalassiosira pseudonana J Biol Chem. 2015 Dec

• Alexander Milentyev, Christoph Heintze, Maryna Abacilar, Marc Gentzel, Nicole

Poulsen, Marcus Rauche, Eike Brunner, Armin Geyer, Nils Kröger, Andrej Shevchenko

Biosilicome-wide profiling of lysine modifications reveals compositional simi-

larity in three diatoms species Mol Cell Proteomics (in preparation)

177

D E C L A R AT I O N / E R K L Ä R U N G

Declaration according to § 5.5 of the doctorate regulations

I herewith declare that I have produced this paper without the prohibited assistanceof third parties and without making use of aids other than those specified; notionstaken over directly or indirectly from other sources have been identified as such. Thispaper has not previously been presented in identical or similar form to any otherGerman or foreign examination board.

The thesis work was conducted from January 6, 2014 to January 6, 2018 under the su-pervision of Dr. Andrej Shevchenko at Max Planck Institute of Molecular Cell Biologyand Genetics.

I declare that I have not undertaken any previous unsuccessful doctorate proceed-ings.

I declare that I recognize the doctorate regulations of the Faculty of Science of Dres-den University of Technology.

Dresden, July 1, 2018

Erklärung entsprechend § 5.5 der Promotionsordnung

Hiermit versichere ich, dass ich die vorliegende Arbeit ohne unzulässige Hilfe Dritterund ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; dieaus fremden Quellen direkt oder indirekt übernommenen Gedanken sind als solchekenntlich gemacht. Die Arbeit wurde bisher weder im Inland noch im Ausland ingleicher oder ähnlicher Form einer anderen Prüfungsbehörde vorgelegt.

Die Dissertation wurde im Zeitraum vom 6. Januar 2014 bis 6. Januar 2018 verfasstund von Dr. Andrej Shevchenko am Max-Planck-Institut für Molekulare Zellbiologieund Genetik betreut.

Meine Person betreffend erkläre ich hiermit, dass keine früheren erfolglosen Promo-tionsverfahren stattgefunden haben.

Ich erkenne die Promotionsordnung der Fakultät für Mathematik und Naturwissen-schaftender Technischen Universität Dresden an.

Dresden, den 1. Juli 2018

Alexander Milentyev

179