A NONREDUNDANT LIST DEVELOPED BY COMBINATION OF FOUR SEPARATE SOURCES*□ S
-
Upload
idontknowyet -
Category
Documents
-
view
1 -
download
0
Transcript of A NONREDUNDANT LIST DEVELOPED BY COMBINATION OF FOUR SEPARATE SOURCES*□ S
The Human Plasma ProteomeA NONREDUNDANT LIST DEVELOPED BY COMBINATION OF FOUR SEPARATE SOURCES*□S
N. Leigh Anderson‡§, Malu Polanski‡, Rembert Pieper¶�, Tina Gatlin¶�,Radhakrishna S. Tirumalai**, Thomas P. Conrads**, Timothy D. Veenstra**,Joshua N. Adkins‡‡, Joel G. Pounds‡‡, Richard Fagan§§, and Anna Lobley§§
We have merged four different views of the human plasmaproteome, based on different methodologies, into a singlenonredundant list of 1175 distinct gene products. Themethodologies used were 1) literature search for proteinsreported to occur in plasma or serum; 2) multidimensionalchromatography of proteins followed by two-dimensionalelectrophoresis and mass spectroscopy (MS) identifica-tion of resolved proteins; 3) tryptic digestion and multidi-mensional chromatography of peptides followed by MSidentification; and 4) tryptic digestion and multidimen-sional chromatography of peptides from low-molecular-mass plasma components followed by MS identification. Of1,175 nonredundant gene products, 195 were included inmore than one of the four input datasets. Only 46 appearedin all four. Predictions of signal sequence and transmem-brane domain occurrence, as well as Genome Ontologyannotation assignments, allowed characterization of thenonredundant list and comparison of the data sources. The“nonproteomic” literature (468 input proteins) is stronglybiased toward signal sequence-containing extracellularproteins, while the three proteomics methods showed amuch higher representation of cellular proteins, includingnuclear, cytoplasmic, and kinesin complex proteins. Cyto-kines and protein hormones were almost completely absentfrom the proteomics data (presumably due to low abun-dance), while categories like DNA-binding proteins werealmost entirely absent from the literature data (perhapsunexpected and therefore not sought). Most major catego-ries of proteins in the human proteome are represented inplasma, with the distribution at successively deeper layersshifting from mostly extracellular to a distribution more likethe whole (primarily cellular) proteome. The resulting non-redundant list confirms the presence of a number of inter-esting candidate marker proteins in plasma and serum.Molecular & Cellular Proteomics 3:311–326, 2004.
The human plasma proteome is likely to contain most, if notall, human proteins, as well as proteins derived from someviruses, bacteria, and fungi. Many of the human proteins,introduced by low-level tissue leakage, ought to be present atvery low concentrations (��pg/ml), while others, such as al-bumin, are present in very large amounts (��mg/ml). Numer-ous post-translationally modified forms of each protein arelikely to be present, along with literally millions of distinctclonal immunoglobulin (Ig)1 sequences. This complexity andenormous dynamic range make plasma the most difficultspecimen to be dealt with by proteomics (1).
At the same time, plasma is the most generally informativeproteome from a medical viewpoint. Almost all cells in thebody communicate with plasma directly or through extracel-lular or cerebrospinal fluids, and many release at least part oftheir contents into plasma upon damage or death. Somemedical conditions, such as myocardial infarction, are offi-cially defined based on the increase of a specific protein in theplasma (e.g. cardiac troponin-T), and it is difficult to argueconvincingly that there is any disease state that does notproduce some specific pattern of protein change in the body’sworking fluid. This immense diagnostic potential has spurreda rapid acceleration in the search for protein disease markersby a wide variety of proteomics strategies.
Current methods of proteomics are only beginning to cat-alog the contents of plasma. Two-dimensional electrophore-sis was able to resolve 40 distinct plasma proteins in 1976 (2),but, because of the dynamic range problem, this number hadonly grown to 60 in 1992 (3) and is substantially unchangedtoday, a quarter century later. It is now clear that more thantwo dimensions of conventional resolution are required toprogress beyond this point. Recently, several truly multidi-mensional survey efforts have been mounted, with the resultthat the number of distinct proteins detected has increaseddramatically. Additional dimensions of separation can be in-troduced at any of three levels: a) separation of intact pro-teins, either by specific binding (e.g. subtraction of definedhigh-abundance proteins) or continuous resolution (e.g. elec-trophoresis or chromatography); b) separation of peptides
From ‡The Plasma Proteome Institute, Washington DC 20009-3450; ¶Large Scale Biology Corporation, Proteomics Division, Ger-mantown, MD 20876; **Laboratory of Proteomics and AnalyticalTechnologies, SAIC-Frederick Inc., National Cancer Institute, Fred-erick, MD 21702-1201; ‡‡Biological Sciences Department, PacificNorthwest National Laboratory, Richland, WA 99352; and §§Inphar-matica Ltd., London, W1T 2NU, United Kingdom
Received, November 29, 2003, and in revised form, January 9,2004
Published, MCP Papers in Press, January 12, 2004, DOI10.1074/mcp.M300127-MCP200
1 The abbreviations used are: Ig, immunoglobulin; MS, mass spec-trometry; GO, Genome Ontology; 2DE, two-dimensional electro-phoresis; NR, nonredundant; TM, transmembrane; LC, liquid chroma-tography; MS/MS, tandem MS; IT, ion trap.
Research
© 2004 by The American Society for Biochemistry and Molecular Biology, Inc. Molecular & Cellular Proteomics 3.4 311This paper is available on line at http://www.mcponline.org
derived from plasma proteins, either by specific binding (e.g.capture by anti-peptide antibodies) or continuous resolution(e.g. chromatography); and c) separation of peptides, andparticularly their fragments, by mass spectrometry (MS).Many possible combinations of these dimensions can beimplemented, the only limitations being the effort, cost, andtime of analyzing many fractions or runs instead of one.
In this article, we have compared and combined data fromthree different multi-dimensional strategies with data from afourth, classical source (the protein biochemistry and clinicalchemistry literature) to provide a meta-level overview of boththe contents and the rate of discovery of new components inplasma. The three experimental datasets are derived from 1)whole protein separation by a three-dimensional process (im-munosubtraction/ion exchange/size exclusion) followed bytwo-dimensional electrophoresis (2DE) followed by MS iden-tification of resolved spots (4); 2) Ig subtraction followed bytrypsin digestion followed by two-dimensional liquid chroma-tography (LC) (ion exchange/reversed phase) followed by tan-dem MS (MS/MS) (5); and 3) molecular mass fractionation,followed by trypsin digestion followed by two-dimensional LC(cation exchange/reversed phase) followed by MS/MS (6).These three experimental approaches have two features incommon (the removal of most Igs, by specific subtraction orsize, and the use of MS for molecular identification) but oth-erwise they span the gamut of proteomics discovery ap-proaches: separation at the protein level, separation at thetryptic peptide level, and a hybrid.
Combining experimental data with literature search resultson proteins detected in plasma (representing a large body ofaccumulated “nonproteomics” data) should provide a broadperspective on plasma contents. Because the same proteinsdetected by various methods can be referred to by differentnames or accession numbers, we have used a sequence-based approach to eliminate redundancy and cluster all oc-currences of the same protein. The resulting list makes itpossible to examine the overlap between the various ap-proaches and to see whether they are biased toward partic-ular classes of proteins. In addition, a pooled nonredundantlist should provide a relatively unbiased survey of the kinds ofproteins present in plasma, which could have important diag-nostic implications. Finally, a large list of proteins actuallyobserved in plasma paves the way for top-down, targetedproteomics approaches to the discovery of disease markers:the development of accurate high-throughput specific assaysfor selected candidates from this list, as a supplement to the useof single methods for marker discovery in small sample sets. Inthe longer term, proteins with strong, mechanistic disease rela-tionships may be viable therapeutic candidates as well.
DATA SOURCES AND METHODS
Lit: Literature Search
Manual Medline searches were performed searching for titles orabstracts containing human plasma or serum proteins, excluding
articles on membranes, stimulation, drug, and dose. A total of 468entries were collected, of which 458 had a human sequence acces-sion number in one or more of the major databases.
2DEMS: Separation of Serum Proteins(LC3/2-DE) � MS/MS Identification
Intact proteins were fractionated by chromatography and 2DE andidentified by MS, generating the dataset described by Pieper et al. (7).Briefly, human blood sera were obtained in equal volumes from twohealthy male donors (ages 40 and 80). Albumin, haptoglobin, trans-ferrin, transthyretin, �-1-anti trypsin, �-1-acid glycoprotein, he-mopexin, and �-2-macroglobulin were removed by immunoaffinitychromatography. The immunoaffinity-subtracted serum concentratewas fractionated further by sequential anion exchange and size ex-clusion chromatography. The resulting 66 samples were individuallysubjected to 2DE. All visible Coomassie Blue R250 spots were cutout, destained, reduced, alkylated, and digested with trypsin. Allextracted peptides were analyzed by matrix-assisted laser desorp-tion/ionization time-of-flight (MALDI-TOF) on a Bruker Biflex orAutoflex mass spectrometer (Bruker, Billerica, MA) and searchedagainst Swiss-Prot. Those samples that did not give positive identi-fication by MALDI-TOF where subjected to LC-MS/MS analysis by iontrap (IT) MS (Thermo Finnegan LCQ, Woburn, MA) and searchedagainst the National Center for Biotechnology Information (NCBI)database using SEQUEST.
LCMS1: Separation of Peptide Digests of SerumMinus Ig (LC2) � MS/MS Identification
A published dataset prepared by Adkins et al., (5) was used. Briefly,human blood serum was obtained from a healthy anonymous femaledonor. Igs were depleted by affinity adsorption chromatography usingprotein A/G. The resulting Ig-depleted plasma was digested withtrypsin and separated by strong cation exchange on a polysulfoethylA column followed by reverse-phase separation on a capillary C18column. The capillary column was interfaced to an IT-MS (ThermoFinnigan LCQ Deca XP) using electrospray ionization. The IT-MS wasconfigured to perform MS/MS scans on the three most intense pre-cursor masses from a single MS scan. All samples were measuredover a mass/charge (m/z) range of 400–2,000, with fractions contain-ing high complexity being measured with segmented m/z ranges.Tandem mass spectra were analyzed by SEQUEST as describedusing the NCBI May 2002 database.
LCMS2: Separation of Peptide Digests of Low-Molecular-MassSerum Proteins (LC2) � MS/MS Identification
The fourth dataset is that described by Tirumalai et al. (6), focusedon the lower-molecular-mass plasma proteome. Briefly standard hu-man serum was purchased from the National Institute of Standardsand Technology. High-molecular-mass proteins were removed in thepresence of acetonitrile using Centriplus centrifugal filters with amolecular mass cutoff of 30 kDa. The low-molecular-mass filtrate wasreduced, alkylated, and digested with trypsin. The digested samplewas fractionated by strong cation exchange chromatography on apolysulfoethyl A column. Reversed-phase LC was subsequently per-formed on 300A Jupiter C-18 column coupled on line to an IT-MS(Thermo Finnegan LCQ Deca XP). Each full MS scan was followed bythree MS/MS scans where the three most abundant peptide molec-ular ions were selected. MS/MS spectra were searched against the ahuman protein database using SEQUEST.
Bioinformatics
Sequence Clustering—The Blastp protein comparison algorithm (8,9) was used to query the sequence of each protein identified against
Nonredundant Human Plasma Proteome
312 Molecular & Cellular Proteomics 3.4
a database containing the aggregate sequences of all proteins iden-tified by any method. Sequences sharing greater than 95% identityover an aligned region were grouped into “unique sequence clusters.”Sequences were unmasked, and the minimum alignment length con-sidered was 15 aa. This similarity-based approach was sufficient togroup identical sequences, sequence fragments, and splice variants.Annotation in the nonredundant table was reported for the “bestannotated” protein in the cluster set.
Signal Peptide Prediction—Signal peptides were predicted usingthe commercially available SignalP version 2.0 neural net and hiddenMarkov model (HMM) algorithms (10) and sigmask (11) signal mask-ing program developed as part of Inpharmatica’s Biopendium (12)protein annotation database. Each sequence received a score of �1for a statistically significant positive signal peptide prediction fromany of the three algorithms. The scores 0, 1, 2, and 3 for a particularsequence were then converted to qualitative terms “no,” “possiblesignal,” “signal,” or “signal confident,” respectively.
Transmembrane Prediction—Transmembrane (TM) regions werepredicted using the commercial version of TMHMM version 2.0 algo-rithm (13). The total number of TM helices predicted per sequencewas reported for each protein sequence. When a predicted TM regionoverlapped a predicted signal sequence (as it did in 40 cases inH_Plasma_NR_v2), this was interpreted as a signal sequence only.
Structural and Sequence-based Domain Annotation—Sequenceswere scanned against a library of BioPendium and iPSI-BLAST (9,11)-like protein profiles constructed from SCOP (14), PFAM (15),PRINTS (16), and PROSITE (17) domain families. Hits to these profileswere reported at a statistical e-value cut-off of 1e-5. This cut-off waschosen to maximize profile coverage and minimize the occurrence offalse positives. Sequences were not masked for low complexity orcoiled coils prior to profile scanning.
Gene Ontology (GO) Term Annotation—NCBI GI number acces-sions for the sequences were matched to their SPTR (18) equivalentsbased on sequences sharing �95% sequence identity over 90% ofthe query sequence length. GO (19) component, process, and func-tion terms were then extracted from text-based annotation files avail-able for download from the GO database ftp site: ftp.geneontology.org/pub/go/gene-associations/gene_association.goa_human. Forgraphical reporting, a series of GO terms in each category wereextracted by text searching of relevant keywords (indicated by thecategory names on plots) through all the assigned GO definitions. AGO component summary for the whole human proteome was pre-pared by applying the same approach to the complete GO humandatabase referred to above.
Database Assembly—The nonredundant (NR) plasma databasewas assembled as a series of tables in a PostgreSQL relationaldatabase and queried to derive summary statistics for tables andfigures shown here.
RESULTS
Number of Distinct Proteins Detected in Plasma, and theNature of Nonredundancy
Four sets of accession numbers for proteins occurring inplasma (468 from Lit, 319 from 2DEMS, 607 [reported as 490nonredundant accessions] from LCMS1, and 341 fromLCMS2) were combined to yield 1,735 total initial accessions(Table I). A total of 55 of the input accessions referred tononhuman sequences, and these were not considered furtherin the present analysis. A very conservative method of select-ing distinct proteins was used in order to avoid countingsequence variants, splice variants, or cleavage products ofone gene product as different: any sequences that shared a
region larger than 15 aa with greater than 95% sequenceidentity were assigned to the same cluster and reported as asingle entry in the nonredundant set. Fig. 1 shows one resultof applying these criteria, in this case resulting in the assign-ment of 10 initial accessions to a single cluster for haptoglo-bin, a major plasma protein found in all four initial datasetsand whose three separate subunit types are derived from asingle translation product. This case also highlights the gen-eral observation that not all datasets used the same primaryaccession database (NCBI GI, Swiss-Prot, or RefSeq as ex-amples). The largest cluster (109 “redundant” entries) is ac-counted for by Igs, where all the Ig heavy and light chains ofall types were clustered together as one entry arbitrarily cho-sen as S40354 (an Ig � chain sequence). Thus 6.2% of theinput accessions were Igs, despite the fact that each of theexperimental methods included steps to remove thesemolecules.
This approach is more conservative (fewer distinct proteinsreported) than the methods used in some of the input datasources, which accounts for the decrease in each set whenintra-set redundancy is removed (1,509 human accessionsremain). When inter-set redundancies are removed (makingthe full list nonredundant by the criteria described above), atotal of 1,175 distinct proteins remain. The entire nonredun-dant set, here abbreviated H_Plasma_NR_v2 (H_Plas-
FIG. 1. Diagram of redundant accessions for haptoglobin. Vari-ous database accessions related to haptoglobin are shown in therectangles, pentagon, and hexagons. The triangles represent theidentifications made in the input datasets. The dark filled trianglerepresents the accession number upon which all other accessionnumbers were collapsed to give a single nonredundant accession.
TABLE IProtein redundancy within and between datasets
The numbers accession numbers contributed by each data sourceand remaining after specific filter operations are shown.
Lit LCMS1 LCMS2 2DEMS Total
Beginning accessions 468 607 341 319 1735Minus nonhuman 458 580 330 312 1680Minus intrasource redundancy
and nonhuman accessions433 475 318 283 1509
Unique to source in NR 284 334 221 141 980Total combined NR list – – – – 1175
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 313
ma_NR_v1 being Table I of Ref. 1), is provided as a supple-mental data table. Of these, a total of 980 occur in only onesource. Because so many entries occur only once, and giventhe non-zero frequency of false MS identifications, independ-ent confirmation will be required to validate most of this list astrue plasma components.
Protein Coverage by Data Source
Of the 1,175 nonredundant human proteins in H_Plas-ma_NR_v2, 195 entries, or 17%, were present in more thanone dataset (set H_Plasma_195: Fig. 2 and Table II). Only 46(4%) were found in all four sets of accessions (Total_sources � 4, shown in bold type in Table II). Of these only one(inter-� trypsin inhibitor heavy chain H1) is predicted to haveeven a single transmembrane domain, and only one (the he-moglobin � chain presumably released from red cell lysis) ispredicted not to have a signal sequence. These characteris-tics (presence of signal sequence and absence of transmem-brane domains) are those expected for major plasma proteinssecreted by organs such as the liver.
An additional 47 proteins (4%) were found in three of thefour datasets (Table II). Of these 47 proteins, only three werefound in all three experimental datasets but not the literaturedataset. These three proteins were pigment epithelium-de-rived factor; a nonmuscle myosin heavy chain; and secretory,extracellular matrix protein 1. Pigment epithelium-derived fac-tor and secretory, extracellular matrix protein 1 are both se-creted proteins (Swiss-Prot annotations), but upon furthersearching only pigment epithelium-derived factor has beenreported as plasma associated (20). Nonmuscle myosin is anintracellular protein (21) possibly derived from platelets. Theremaining 43 proteins seen in three datasets were all docu-mented plasma proteins.
A further 102 proteins (9%) were found in two datasets(Table II). Of these, 43 proteins were found in two experimen-tal datasets but not in the literature dataset. These include a
number of proteins that would not typically be thought of aslikely plasma components, including a chloride channel and acopper-transporting ATPase (with 10 and 7 predicted trans-membrane domains, respectively), an oxygen-regulated pro-tein, three hypothetical proteins, and a group of likely nuclearproteins including mismatch repair protein, mitotic kinesin-like protein 1, and centromere protein F.
The remaining 980 proteins (83% of NR) were found in onlyone of the four input datasets. Of these, 696 proteins (71%)were found only in the experimental sets, with LCMS1,LCMS2, and Lit having similar large percentages of source-unique proteins (70, 69, and 66%, respectively, versus 50%for 2DEMS).
Characterization of the Plasma Proteome ViaAnnotation Statistics
Predicted Signal Sequences—The signal sequence predic-tion algorithm used yielded four levels of likelihood: “no”(strong probability of no signal sequence), “possible signal,”“signal,” and “signal confident” (strong probability of a signalsequence) in order of increasing likelihood of a signal se-quence (i.e. the number of algorithms out of the three usedthat predict a signal sequence). The procedure does not dis-tinguish between the signal sequences of secreted and mem-brane-bound proteins (including e.g. plasma, Golgi, and mi-tochondrial membranes), and thus does not directly predictfinal protein location. Most of the 1,175 H_Plasma_NR_v2nonredundant sequences (83%) yielded a strong positive ornegative prediction (i.e. good agreement between the threeprediction algorithms used), with these two results occurringin about a 2:3 ratio overall. Approximately 49% of H_Plas-ma_NR_v2 had no evidence of a signal sequence, while onlyabout 25% of the H_Plasma_195 lacked such evidence; con-versely only 34% of H_Plasma_NR_v2 gave a “signal confi-dent” while 54% of H_Plasma_195 gave this signal. Compar-ing the four data sources over H_Plasma_195 (Fig. 3A), entriesfrom all four sources were likely to have signal sequences: theLit set had the highest bias toward “signal confident” proteins(indicated by a ratio of “signal confident” to “no” signal of5.05), while 2DEMS showed less bias (ratio of 3.44) andLCMS1 and LCMS2 showed a higher representation of “no”signal predictions (ratios of 2.03 and 1.96, respectively). Com-paring the four sources across all the 1,175 H_Plasma_NR_v2proteins (Fig. 3B), these ratios were reduced, reflecting agreater preponderance of “no” signal proteins, but the relativedifferences between the sources remained, yielding ratios forLit, 2DEMS, LCMS1, and LCMS2 of 3.85, 0.91, 0.49, and0.44, respectively.
Predicted Transmembrane Segments—The number of pre-dicted TM segments (0–21) is shown for each of the fourdatasets (Fig. 4) in comparison with the distribution summa-rizing The Human Membrane Protein Library (HMPL, cbs.umn.edu/human/info/hs.dat; Fig. 4B, line). The HMPL con-
FIG. 2. Diagram of proteins found in multiple datasets. All over-laps are shown (2-way, 3-way, and 4-way) for all four input datasets:Lit (dotted line), 2DEMS (dashed line), LCMS1 (solid line), and LCMS2(double solid line). Numbers represent the number of shared acces-sions in the respective overlapping areas.
Nonredundant Human Plasma Proteome
314 Molecular & Cellular Proteomics 3.4
TAB
LEII
Pla
sma
pro
tein
sd
etec
ted
inat
leas
ttw
od
atas
ets
The
tab
lep
rese
nts
ano
nred
und
ant
list
of19
5p
rote
ins
foun
din
atle
ast
two
ofth
efo
urin
put
dat
aso
urce
s,al
pha
bet
ized
by
pro
tein
des
crip
tion
(con
tain
ing
nam
ean
dsy
nony
ms)
.E
ntrie
sfo
und
inal
lfou
rd
ata
sour
ces
are
show
nin
bol
d.
Lit,
2DE
MS
,LC
MS
1,an
dLC
MS
2co
lum
nsgi
veth
enu
mb
erof
acce
ssio
nsin
each
orig
inal
dat
ase
tth
atw
ere
assi
gned
toea
chN
Ren
try.
Thes
ear
esu
mm
edac
ross
the
dat
aso
urce
sto
yiel
dTo
tal_
acce
ssio
ns.T
otal
_sou
rces
sum
mar
izes
the
num
ber
ofso
urce
s(h
ere
rang
ing
from
2to
4)in
whi
chth
een
try
was
foun
d.S
igna
lpro
vid
eson
eof
four
pos
sib
leva
lues
resu
lting
from
the
sign
alse
que
nce
pre
dic
tion
pro
ced
ure.
TMgi
ves
the
pre
dic
ted
num
ber
oftr
ansm
emb
rane
segm
ents
inth
ep
rote
in.
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
P10
809
10
11
33
No
060
-kD
ahe
atsh
ock
pro
tein
,m
itoch
ond
rialp
recu
rsor
(Hsp
60)
(60-
kDa
chap
eron
in)
(CP
N60
)(H
eat
shoc
kp
rote
in60
)(H
SP
-60)
(mito
chon
dria
lmat
rixp
rote
inP
1)(P
60ly
mp
hocy
tep
rote
in)
(huc
ha60
)A
AB
2704
50
01
12
2P
ossi
ble
sign
al0
70-k
Da
per
oxis
omal
mem
bra
nce
pro
tein
hom
olog
(inte
rnal
frag
men
t)P
0257
02
40
28
3N
o0
Act
in,
cyto
pla
smic
1(�
-act
in)
Q15
848
11
00
22
Sig
nalc
onfid
ent
0A
dip
onec
tinp
recu
rsor
(30-
kDa
adip
ocyt
eco
mp
lem
ent-
rela
ted
pro
tein
)(A
CR
P30
)(a
dip
ose
mos
tab
und
ant
gene
tran
scrip
t1)
(ap
m-1
)(g
elat
in-b
ind
ing
pro
tein
)N
P_0
0112
40
11
02
2S
igna
lcon
fiden
t0
Afa
min
pre
curs
or;
�-a
lbum
in(H
omo
sap
iens
)P
0276
31
11
03
3S
igna
lcon
fiden
t0
�-1
-aci
dgl
ycop
rote
in1
pre
curs
or(A
GP
1)(o
roso
muc
oid
1)(O
MD
1)P
0101
11
12
04
3S
igna
lcon
fiden
t0
�-1
-ant
ichy
mot
ryp
sin
pre
curs
or(A
CT)
P01
009
11
11
44
Sig
nalc
onf
iden
t0
�-1
-ant
itry
psi
np
recu
rso
r(�
-1p
rote
ase
inhi
bit
or)
(�-1
-ant
ipro
tein
ase)
(PR
O06
84/
PR
O22
09)
P04
217
11
10
33
No
0�
-1B
-gly
cop
rote
inp
recu
rsor
(�-1
-Bgl
ycop
rote
in)
P08
697
11
11
44
Sig
nal
0�
-2-a
ntip
lasm
inp
recu
rso
r(�
-2-p
lasm
inin
hib
ito
r)(�
-2-P
I)(�
-2-A
P)
P02
765
11
11
44
Sig
nalc
onf
iden
t0
�-2
-HS
-gly
cop
rote
inp
recu
rso
r(F
etui
n-A
)(�
-2-Z
-glo
bul
in)
(Ba-
�-2
-gly
cop
rote
in)
(PR
O27
43)
P01
023
11
21
54
Sig
nalc
onf
iden
t0
�-2
-mac
rog
lob
ulin
pre
curs
or
(�-2
-M)
P02
760
11
10
33
Sig
nalc
onfid
ent
0A
MB
Pp
rote
inp
recu
rsor
[con
tain
s�
-1-m
icro
glob
ulin
(pro
tein
HC
)(c
omp
lex-
form
ing
glyc
opro
tein
hete
roge
neou
sin
char
ge)
(�-1
mic
rogl
ycop
rote
in);
inte
r-�
-try
psi
nin
hib
itor
light
chai
n(IT
I-LC
)(b
ikun
in)
(HI-
30)]
P01
019
11
11
44
Sig
nal
0A
ngio
tens
ino
gen
pre
curs
or
[co
ntai
nsan
gio
tens
inI
(Ang
I);an
gio
tens
inII
(Ang
II);
ang
iote
nsin
III(A
ngIII
)(D
es-A
sp[1
]-an
gio
tens
inII)
]P
0100
81
11
14
4S
igna
l0
Ant
ithr
om
bin
-III
pre
curs
or
(AT
III)
(PR
O03
09)
P02
647
11
22
64
Sig
nalc
onf
iden
t0
Ap
olip
op
rote
inA
-Ip
recu
rso
r(A
po
-AI)
P02
652
11
11
44
Sig
nalc
onf
iden
t0
Ap
olip
op
rote
inA
-II
pre
curs
or
(Ap
o-A
II)(a
po
a-II)
P06
727
11
12
54
Sig
nalc
onf
iden
t0
Ap
olip
op
rote
inA
-IV
pre
curs
or
(Ap
o-A
IV)
P04
114
11
20
43
Sig
nalc
onfid
ent
0A
pol
ipop
rote
inB
-100
pre
curs
or(A
po
B-1
00)
[con
tain
s:ap
olip
opro
tein
B-4
8(A
po
B-4
8)]
P02
655
11
11
44
Sig
nalc
onf
iden
t0
Ap
olip
op
rote
inC
-II
pre
curs
or
(Ap
o-C
II)P
0265
61
11
14
4S
igna
lco
nfid
ent
0A
po
lipo
pro
tein
C-I
IIp
recu
rso
r(A
po
-CIII
)P
0509
01
11
14
4S
igna
lco
nfid
ent
0A
po
lipo
pro
tein
Dp
recu
rso
r(A
po
-D)
(ap
od
)P
0264
91
32
17
4S
igna
lco
nfid
ent
0A
po
lipo
pro
tein
Ep
recu
rso
r(A
po
-E)
Q13
790
11
11
44
Sig
nalc
onf
iden
t0
Ap
olip
op
rote
inF
pre
curs
or
(Ap
o-F
)O
1479
11
11
14
4S
igna
l0
Ap
olip
op
rote
inL1
pre
curs
or
(ap
olip
op
rote
inL-
I)(a
po
lipo
pro
tein
L)(a
po
l-I)
(Ap
o-L
)(a
po
l)P
0851
91
01
02
2S
igna
l0
Ap
olip
opro
tein
(a)
pre
curs
or(E
C3.
4.21
.�)
(Ap
o(a)
)(L
p(a
))P
0657
60
11
02
2P
ossi
ble
sign
al0
ATP
synt
hase
�ch
ain,
mito
chon
dria
lpre
curs
or(E
C3.
6.3.
14)
P01
160
10
10
22
Sig
nalc
onfid
ent
0A
tria
lnat
riure
ticfa
ctor
pre
curs
or(A
NF)
(atr
ialn
atriu
retic
pep
tide)
(AN
P)
(pre
pro
natr
iod
ilatin
)[c
onta
ins:
card
iod
ilatin
-rel
ated
pep
tide
(CD
P)]
P02
749
11
11
44
Sig
nalc
onf
iden
t0
�-2
-gly
cop
rote
inI
pre
curs
or
(ap
olip
op
rote
inH
)(A
po
-H)
(B2G
PI)
(�(2
)GP
I)(a
ctiv
ated
pro
tein
C-b
ind
ing
pro
tein
)(A
PC
inhi
bit
or)
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 315
TAB
LEII—
cont
inue
d
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
P01
884
10
01
22
Sig
nalc
onfid
ent
0�
-2-m
icro
glob
ulin
pre
curs
orI3
9467
00
11
22
No
0B
ullo
usp
emp
higo
idan
tigen
,hu
man
(frag
men
t)P
0400
31
11
03
3S
igna
l0
C4b
-bin
din
gp
rote
in�
chai
np
recu
rsor
(c4b
p)
(pro
line-
rich
pro
tein
)(P
RP
)P
2085
11
10
02
2S
igna
lcon
fiden
t0
C4b
-bin
din
gp
rote
in�
chai
np
recu
rsor
P05
109
11
00
22
No
0C
algr
anul
inA
(Mig
ratio
nin
hib
itory
fact
or-r
elat
edp
rote
in8)
(MR
P-8
)(c
ystic
fibro
sis
antig
en)
(CFA
G)
(P8)
(leuk
ocyt
eL1
com
ple
xlig
htch
ain)
(S10
0ca
lciu
m-b
ind
ing
pro
tein
A8)
(cal
pro
tect
inL1
Lsu
bun
it)N
P_0
0172
90
11
02
2N
o0
Car
bon
ican
hyd
rase
I;ca
rbon
icd
ehyd
rata
se(H
omo
sap
iens
)P
2279
21
11
03
3N
o0
Car
box
ypep
tidas
eN
83-k
Da
chai
n(c
arb
oxyp
eptid
ase
Nre
gula
tory
sub
unit)
(frag
men
t)P
1516
91
10
02
2S
igna
l0
Car
box
ypep
tidas
eN
cata
lytic
chai
np
recu
rsor
(EC
3.4.
17.3
)(a
rgin
ine
carb
oxyp
eptid
ase)
(kin
ase
1)(s
erum
carb
oxyp
eptid
ase
N)
(SC
PN
)(a
nap
hyla
toxi
nin
activ
ator
)(p
lasm
aca
rbox
ypep
tidas
eB
)P
0733
91
10
02
2S
igna
lcon
fiden
t0
Cat
hep
sin
Dp
recu
rsor
(EC
3.4.
23.5
)P
0771
11
10
02
2S
igna
lcon
fiden
t0
Cat
hep
sin
Lp
recu
rsor
(EC
3.4.
22.1
5)(m
ajor
excr
eted
pro
tein
)(M
EP
)P
2577
40
10
12
2S
igna
lcon
fiden
t0
Cat
hep
sin
Sp
recu
rsor
(EC
3.4.
22.2
7)N
P_0
0518
50
01
12
2N
o0
CC
AA
T/en
hanc
erb
ind
ing
pro
tein
�,
inte
rleuk
in6-
dep
end
ent
O43
866
11
00
22
Sig
nalc
onfid
ent
0C
D5
antig
en-l
ike
pre
curs
or(S
P-�
)(C
T-2)
(igm
-ass
ocia
ted
pep
tide)
NP
_005
187
00
21
32
No
0C
entr
omer
ep
rote
inF
(350
/400
kd,
mito
sin)
;m
itosi
n;ce
ntro
mer
eP
0045
01
11
14
4S
igna
lco
nfid
ent
0C
erul
op
lasm
inp
recu
rso
r(E
C1.
16.3
.1)
(fer
roxi
das
e)N
P_0
0642
10
01
12
2N
o0
Cha
per
onin
cont
aini
ngTC
P1,
sub
unit
4(�
);ch
aper
onin
NP
_004
061
00
11
22
No
10C
hlor
ide
chan
nelK
a;ch
lorid
ech
anne
l,ki
dne
y,A
;hc
lc-K
a(H
omo
sap
iens
)P
0627
61
10
02
2S
igna
lcon
fiden
t1
Cho
lines
tera
sep
recu
rsor
(EC
3.1.
1.8)
(acy
lcho
line
acyl
hyd
rola
se)
(cho
line
este
rase
II)(b
utyr
ylch
olin
ees
tera
se)
(pse
udoc
holin
este
rase
)P
1090
91
11
14
4S
igna
lco
nfid
ent
0C
lust
erin
pre
curs
or
(co
mp
lem
ent-
asso
ciat
edp
rote
inS
P-4
0,40
)(c
om
ple
men
tcy
toly
sis
inhi
bit
or)
(CLI
)(N
A1
and
NA
2)(a
po
lipo
pro
tein
J)(A
po
-J)
(TR
PM
-2)
P00
740
11
01
33
Sig
nal
0C
oagu
latio
nfa
ctor
IXp
recu
rsor
(EC
3.4.
21.2
2)(C
hris
tmas
fact
or)
P12
259
10
11
33
Sig
nalc
onfid
ent
0C
oagu
latio
nfa
ctor
Vp
recu
rsor
(act
ivat
edp
rote
inC
cofa
ctor
)P
0045
11
01
02
2S
igna
l0
Coa
gula
tion
fact
orV
IIIp
recu
rsor
(pro
coag
ulan
tco
mp
onen
t)(a
ntih
emop
hilic
fact
or)
(AH
F)P
0074
21
10
02
2S
igna
lcon
fiden
t0
Coa
gula
tion
fact
orX
pre
curs
or(E
C3.
4.21
.6)
(Stu
art
fact
or)
P00
748
11
11
44
Sig
nalc
onf
iden
t0
Co
agul
atio
nfa
cto
rX
IIp
recu
rso
r(E
C3.
4.21
.38)
(Hag
eman
fact
or)
(HA
F)P
0048
81
10
13
3N
o0
Coa
gula
tion
fact
orX
IIIA
chai
np
recu
rsor
(EC
2.3.
2.13
)(p
rote
in-g
luta
min
e�-g
luta
myl
tran
sfer
ase
Ach
ain)
(tran
sglu
tam
inas
eA
chai
n)P
0516
01
11
03
3S
igna
lcon
fiden
t0
Coa
gula
tion
fact
orX
IIIB
chai
np
recu
rsor
(pro
tein
-glu
tam
ine
�-g
luta
myl
tran
sfer
ase
Bch
ain)
(tran
sglu
tam
inas
eB
chai
n)(fi
brin
stab
ilizi
ngfa
ctor
Bsu
bun
it)P
0246
21
01
13
3S
igna
l0
Col
lage
n�
1(IV
)ch
ain
pre
curs
orP
0073
61
11
14
4S
igna
lco
nfid
ent
0C
om
ple
men
tC
1rco
mp
one
ntp
recu
rso
rP
0987
11
11
14
4S
igna
lco
nfid
ent
0C
om
ple
men
tC
1sco
mp
one
ntp
recu
rso
r(E
C3.
4.21
.42)
(C1
este
rase
)P
0668
11
11
03
3S
igna
lcon
fiden
t0
Com
ple
men
tC
2p
recu
rsor
(EC
3.4.
21.4
3)(C
3/C
5co
nver
tase
)P
0102
41
11
14
4S
igna
lco
nfid
ent
0C
om
ple
men
tC
3p
recu
rso
r(c
ont
ains
C3a
anap
hyla
toxi
n)P
0102
81
12
15
4S
igna
l0
Co
mp
lem
ent
C4
pre
curs
or
(co
ntai
nsC
4Aan
aphy
lato
xin)
P01
031
11
10
33
Sig
nalc
onfid
ent
0C
omp
lem
ent
C5
pre
curs
or(c
onta
ins
C5a
anap
hyla
toxi
n)P
1367
11
11
14
4S
igna
l0
Co
mp
lem
ent
com
po
nent
C6
pre
curs
or
P10
643
11
10
33
Sig
nalc
onfid
ent
0C
omp
lem
ent
com
pon
ent
C7
pre
curs
orP
0735
71
11
14
4S
igna
lco
nfid
ent
0C
om
ple
men
tco
mp
one
ntC
8�
chai
np
recu
rso
rP
0735
81
11
03
3S
igna
lcon
fiden
t0
Com
ple
men
tco
mp
onen
tC
8�
chai
np
recu
rsor
P07
360
11
10
33
Sig
nalc
onfid
ent
0C
omp
lem
ent
com
pon
ent
C8
�ch
ain
pre
curs
or
Nonredundant Human Plasma Proteome
316 Molecular & Cellular Proteomics 3.4
TAB
LEII—
cont
inue
d
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
P02
748
11
11
44
Sig
nalc
onf
iden
t0
Co
mp
lem
ent
com
po
nent
C9
pre
curs
or
P00
751
11
11
44
Sig
nal
0C
om
ple
men
tfa
cto
rB
pre
curs
or
(EC
3.4.
21.4
7)(C
3/C
5co
nver
tase
)(p
rop
erd
infa
cto
rB
)(g
lyci
ne-r
ich
�g
lyco
pro
tein
)(G
BG
)(P
BF2
)P
0860
31
12
04
3S
igna
lcon
fiden
t0
Com
ple
men
tfa
ctor
Hp
recu
rsor
(Hfa
ctor
1)A
4045
50
11
02
2N
o0
Com
ple
men
tfa
ctor
H-r
elat
edp
rote
in(c
lone
H36
–1)
pre
curs
or,
hum
an(fr
agm
ent)
P05
156
13
10
53
Sig
nal
0C
omp
lem
ent
fact
orI
pre
curs
or(E
C3.
4.21
.45)
(C3B
/C4B
inac
tivat
or)
P48
740
11
00
22
Sig
nalc
onfid
ent
0C
omp
lem
ent-
activ
atin
gco
mp
onen
tof
Ra-
reac
tive
fact
orp
recu
rsor
(EC
3.4.
21.�
)(R
a-re
activ
efa
ctor
serin
ep
rote
ase
p10
0)(r
arf)
(man
nan-
bin
din
gle
ctin
serin
ep
rote
ase
1)(m
anno
se-b
ind
ing
pro
tein
asso
ciat
edse
rine
pro
teas
e)(M
AS
P-1
)Q
0465
60
11
02
2N
o7
Cop
per
-tra
nsp
ortin
gA
TPas
e1
(EC
3.6.
3.4)
(cop
per
pum
p1)
(Men
kes
dis
ease
-ass
ocia
ted
pro
tein
)P
0818
51
11
03
3S
igna
l0
Cor
ticos
tero
id-b
ind
ing
glob
ulin
pre
curs
or(C
BG
)(tr
ansc
ortin
)P
0274
11
10
02
2S
igna
lcon
fiden
t0
C-r
eact
ive
pro
tein
pre
curs
orP
0673
21
10
02
2N
o0
Cre
atin
eki
nase
,M
chai
n(E
C2.
7.3.
2)(M
-CK
)P
4990
21
01
02
2N
o0
Cyt
osol
icp
urin
e5�
-nuc
leot
idas
e(E
C3.
1.3.
5)1D
SN
01
20
32
No
0D
60S
N-t
erm
inal
lob
ehu
man
lact
ofer
rinP
0917
21
10
02
2S
igna
lcon
fiden
t0
Dop
amin
e�
-mon
ooxy
gena
sep
recu
rsor
(EC
1.14
.17.
1)(d
opam
ine
�-h
ydro
xyla
se)
(DB
H)
JC25
210
01
12
2P
ossi
ble
sign
al1
End
othe
linco
nver
ting
enzy
me
(EC
3.4.
24.�
)1,
umb
ilica
lvei
nen
dot
helia
lcel
lfor
m,
hum
anN
P_0
0441
60
11
13
3S
igna
lcon
fiden
t0
Ext
race
llula
rm
atrix
pro
tein
1is
ofor
m1
pre
curs
or;
secr
etor
yP
0267
11
01
13
3S
igna
lcon
fiden
t0
Fib
rinog
en�
/�-E
chai
np
recu
rsor
(con
tain
sfib
rinop
eptid
eA
)P
0267
51
10
13
3S
igna
l0
Fib
rinog
en�
chai
np
recu
rsor
(con
tain
sfib
rinop
eptid
eB
)P
0275
11
13
16
4S
igna
lco
nfid
ent
0Fi
bro
nect
inp
recu
rso
r(F
N)
(co
ld-i
nso
lub
leg
lob
ulin
)(C
IG)
P23
142
11
01
33
Sig
nalc
onfid
ent
0Fi
bul
in-1
pre
curs
orO
7563
61
10
02
2S
igna
lcon
fiden
t0
Fico
lin3
pre
curs
or(c
olla
gen/
fibrin
ogen
dom
ain-
cont
aini
ngp
rote
in3)
(col
lage
n/fib
rinog
end
omai
n-co
ntai
ning
lect
in3
P35
)(H
akat
aan
tigen
)P
0122
51
10
02
2S
igna
l0
Folli
trop
in�
chai
np
recu
rsor
(folli
cle-
stim
ulat
ing
horm
one
�su
bun
it)(F
SH
-�)
(FS
H-B
)P
0910
41
10
02
2N
o0
Gam
ma
enol
ase
(EC
4.2.
1.11
)(2
-pho
spho
-D-g
lyce
rate
hyd
roly
ase)
(neu
rale
nola
se)
(NS
E)
(eno
lase
2)P
0639
61
11
14
4S
igna
lco
nfid
ent
0G
elso
linp
recu
rso
r,p
lasm
a(a
ctin
-dep
oly
mer
izin
gfa
cto
r)(A
DF)
(Bre
vin)
(AG
EL)
P14
136
11
00
22
No
0G
lialf
ibril
lary
acid
icp
rote
in,
astr
ocyt
e(G
FAP
)Q
0460
91
11
03
3P
ossi
ble
sign
al1
Glu
tam
ate
carb
oxyp
eptid
ase
II(E
C3.
4.17
.21)
A47
531
01
10
22
Pos
sib
lesi
gnal
1G
luta
myl
amin
opep
tidas
e(E
C3.
4.11
.7),
hum
anN
P_0
0149
40
11
02
2S
igna
l0
Gly
cosy
lpho
spha
tidyl
inos
itols
pec
ific
pho
spho
lipas
eD
1is
ofor
m1
AA
B34
872
00
52
72
No
0G
P12
0,IH
RP
�IT
Ihe
avy
chai
n-re
late
dp
rote
in(in
tern
alfr
agm
ent)
JW00
570
11
02
2N
o0
Gra
vin,
hum
anP
0073
73
33
110
4S
igna
lco
nfid
ent
0H
apto
glo
bin
-1p
recu
rso
rP
0192
21
00
12
2N
o0
Hem
oglo
bin
�ch
ain
P02
023
11
11
44
No
0H
emo
glo
bin
�ch
ain
P02
790
11
10
33
Sig
nalc
onfid
ent
0H
emop
exin
pre
curs
or(�
-1B
-gly
cop
rote
in)
P05
546
11
11
44
Sig
nalc
onf
iden
t0
Hep
arin
cofa
cto
rII
pre
curs
or
(HC
-II)
(pro
teas
ein
hib
ito
rle
user
pin
2)(H
LS2)
Q04
756
01
01
22
Sig
nalc
onfid
ent
0H
epat
ocyt
egr
owth
fact
orac
tivat
orp
recu
rsor
(EC
3.4.
21.�
)(H
GF
activ
ator
)(H
GFA
)Q
1452
01
01
02
2S
igna
l0
HG
Fac
tivat
orlik
ep
rote
in(h
yalu
rona
nb
ind
ing
pro
tein
2)P
0419
61
11
14
4S
igna
lco
nfid
ent
0H
isti
din
e-ri
chg
lyco
pro
tein
pre
curs
or
(his
tid
ine-
pro
line
rich
gly
cop
rote
in)
(HP
RG
)P
3115
10
11
02
2N
o0
Hum
anp
soria
sin
(s10
0a7)
T147
380
01
12
2P
ossi
ble
sign
al0
Hyp
othe
tical
pro
tein
dkf
zp56
4a24
16.1
,hu
man
(frag
men
t)
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 317
TAB
LEII—
cont
inue
d
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
T087
720
01
12
2N
o0
Hyp
othe
tical
pro
tein
dkf
zp58
6m12
1.1,
hum
an(fr
agm
ent)
T000
630
01
12
2N
o0
Hyp
othe
tical
pro
tein
KIA
A04
37,
hum
an(fr
agm
ent)
S40
354
1114
786
109
4S
igna
l0
Imm
uno
glo
bul
in�
chai
n,hu
man
P01
591
11
10
33
Pos
sib
lesi
gnal
0Im
mun
oglo
bul
inJ
chai
nP
0847
61
01
02
2S
igna
lcon
fiden
t0
Inhi
bin
�A
chai
np
recu
rsor
(act
ivin
�-A
chai
n)(e
ryth
roid
diff
eren
tiatio
np
rote
in)
(ED
F)P
1793
61
00
12
2S
igna
lcon
fiden
t0
Insu
lin-l
ike
grow
thfa
ctor
bin
din
gp
rote
in3
pre
curs
or(IG
FBP
-3)
(IBP
-3)
(IGF-
bin
din
gp
rote
in3)
P24
593
10
11
33
Sig
nalc
onfid
ent
0In
sulin
-lik
egr
owth
fact
orb
ind
ing
pro
tein
5p
recu
rsor
(IGFB
P-5
)(IB
P-5
)(IG
F-b
ind
ing
pro
tein
5)P
3585
81
11
03
3S
igna
lcon
fiden
t0
Insu
lin-l
ike
grow
thfa
ctor
bin
din
gp
rote
inco
mp
lex
acid
lab
ilech
ain
pre
curs
or(A
LS)
P01
343
10
01
22
Sig
nal
0In
sulin
-lik
egr
owth
fact
orIA
pre
curs
or(IG
F-IA
)(s
omat
omed
inC
)P
1982
71
12
15
4S
igna
lco
nfid
ent
1In
ter-
�-t
ryp
sin
inhi
bit
or
heav
ych
ain
H1
pre
curs
or
(ITI
heav
ych
ain
H1)
(Inte
r-�
-in
hib
ito
rhe
avy
chai
n1)
(Inte
r-�
-try
psi
nin
hib
ito
rco
mp
lex
com
po
nent
III)
(ser
um-
der
ived
hyal
uro
nan-
asso
ciat
edp
rote
in)
(SH
AP
)P
1982
31
11
14
4S
igna
lco
nfid
ent
0In
ter-
�-t
ryp
sin
inhi
bit
or
heav
ych
ain
H2
pre
curs
or
(ITI
heav
ych
ain
H2)
(inte
r-�
-in
hib
ito
rhe
avy
chai
n2)
(inte
r-�
-try
psi
nin
hib
ito
rco
mp
lex
com
po
nent
II)(s
erum
-d
eriv
edhy
alur
ona
n-as
soci
ated
pro
tein
)(S
HA
P)
Q06
033
11
11
44
Sig
nal
0In
ter-
�-t
ryp
sin
inhi
bit
or
heav
ych
ain
H3
pre
curs
or
(ITI
heav
ych
ain
H3)
(Inte
r-�
-in
hib
ito
rhe
avy
chai
n3)
(ser
um-d
eriv
edhy
alur
ona
n-as
soci
ated
pro
tein
)(S
HA
P)
Q14
624
12
11
54
Sig
nal
0In
ter-
�-t
ryp
sin
inhi
bit
or
heav
ych
ain
H4
pre
curs
or
A33
481
10
10
22
No
0In
terf
eron
-ind
uced
vira
lres
ista
nce
pro
tein
mxa
,hu
man
P29
459
10
10
22
Sig
nalc
onfid
ent
0In
terle
ukin
-12
�ch
ain
pre
curs
or(IL
-12A
)(c
ytot
oxic
lym
pho
cyte
mat
urat
ion
fact
or35
-kD
asu
bun
it)(C
LMF
p35
)(N
Kce
llst
imul
ator
yfa
ctor
chai
n1)
(NK
SF1
)P
4093
31
00
12
2S
igna
l0
Inte
rleuk
in-1
5p
recu
rsor
(IL-1
5)P
0523
11
10
02
2S
igna
lcon
fiden
t0
Inte
rleuk
in-6
pre
curs
or(IL
-6)
(B-c
ells
timul
ator
yfa
ctor
2)(B
SF-
2)(in
terf
eron
�-2
)(h
ybrid
oma
grow
thfa
ctor
)P
C11
020
02
13
2N
o0
Ker
atin
10,
typ
eI,
cyto
skel
etal
(clo
neH
K51
),hu
man
(frag
men
t)N
P_0
0898
50
11
02
2N
o0
Kin
esin
fam
ilym
emb
er3A
;ki
nesi
nfa
mily
pro
tein
3A(H
omo
sap
iens
)P
0104
21
14
28
4S
igna
lco
nfid
ent
0K
inin
og
enp
recu
rso
r(�
-2-t
hio
lpro
tein
ase
inhi
bit
or)
(co
ntai
nsb
rad
ykin
in)
P02
750
11
10
33
Sig
nal
0Le
ucin
e-ric
h�
-2-g
lyco
pro
tein
pre
curs
or(L
RG
)P
1842
81
10
02
2S
igna
lcon
fiden
t0
Lip
opol
ysac
char
ide-
bin
din
gp
rote
inp
recu
rsor
(LB
P)
P07
195
11
00
22
No
0L-
lact
ate
deh
ydro
gena
seB
chai
n(E
C1.
1.1.
27)
(LD
H-B
)(L
DH
hear
tsu
bun
it)(L
DH
-H)
P51
884
01
10
22
Sig
nalc
onfid
ent
0Lu
mic
anp
recu
rsor
(ker
atan
sulfa
tep
rote
ogly
can
lum
ican
)(K
SP
Glu
mic
an)
NP
_005
920
10
10
22
Sig
nalc
onfid
ent
1M
elan
oma-
asso
ciat
edan
tigen
p97
isof
orm
1,p
recu
rsor
P10
636
10
10
22
No
0M
icro
tub
ule-
asso
ciat
edp
rote
in�
(neu
rofib
rilla
ryta
ngle
pro
tein
)(P
aire
dhe
lical
filam
ent-
�)(P
HF-
�)I3
7550
00
11
22
No
0M
ism
atch
rep
air
pro
tein
MS
H2,
hum
anQ
0224
10
01
12
2N
o0
Mito
ticki
nesi
n-lik
ep
rote
in-1
(kin
esin
-lik
ep
rote
in5)
P08
571
11
00
22
Sig
nalc
onfid
ent
0M
onoc
yte
diff
eren
tiatio
nan
tigen
CD
14p
recu
rsor
(mye
loid
cell-
spec
ific
leuc
ine-
rich
glyc
opro
tein
)P
3557
90
11
13
3N
o0
Myo
sin
heav
ych
ain,
nonm
uscl
ety
pe
A(c
ellu
lar
myo
sin
heav
ych
ain,
typ
eA
)(n
onm
uscl
em
yosi
nhe
avy
chai
n-A
)(N
MM
HC
-A)
P12
882
01
01
22
No
0M
yosi
nhe
avy
chai
n,sk
elet
alm
uscl
e,ad
ult
1(m
yosi
nhe
avy
chai
niix
/d)
(myh
c-iix
/d)
NP
_006
380
01
10
22
Sig
nalc
onfid
ent
1O
xyge
nre
gula
ted
pro
tein
pre
curs
or;
oxyg
enre
gula
ted
pro
tein
P01
270
11
00
22
Sig
nal
0P
arat
hyro
idho
rmon
ep
recu
rsor
(Par
athy
rin)
(PTH
)(P
arat
horm
one)
Nonredundant Human Plasma Proteome
318 Molecular & Cellular Proteomics 3.4
TAB
LEII—
cont
inue
d
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
NP
_006
784
00
11
22
Pos
sib
lesi
gnal
0P
erox
ired
oxin
3;an
tioxi
dan
tp
rote
in1;
thio
red
oxin
-dep
end
ent
Q15
648
10
01
22
No
0P
erox
isom
ep
rolif
erat
or-a
ctiv
ated
rece
pto
rb
ind
ing
pro
tein
(PB
P)
(PP
AR
bin
din
gp
rote
in)
(thyr
oid
horm
one
rece
pto
r-as
soci
ated
pro
tein
com
ple
xco
mp
onen
tTR
AP
220)
(thyr
oid
rece
pto
rin
tera
ctin
gp
rote
in2)
(TR
IP2)
(p53
regu
lato
ryp
rote
inR
B18
A)
P04
180
11
00
22
Sig
nalc
onfid
ent
2P
hosp
hatid
ylch
olin
e-st
erol
acyl
tran
sfer
ase
pre
curs
or(E
C2.
3.1.
43)
(leci
thin
-cho
lest
erol
acyl
tran
sfer
ase)
(pho
spho
lipid
-cho
lest
erol
acyl
tran
sfer
ase)
NP
_001
074
01
10
22
No
0P
hosp
hod
iest
eras
e5A
isof
orm
1;cg
mp
-bin
din
gcg
mp
-sp
ecifi
cP
1525
91
11
03
3N
o0
Pho
spho
glyc
erat
em
utas
e2
(EC
5.4.
2.1)
(EC
5.4.
2.4)
(EC
3.1.
3.13
)(p
hosp
hogl
ycer
ate
mut
ase
isoz
yme
M)
(PG
AM
-M)
(BP
G-d
epen
den
tP
GA
M2)
(mus
cle-
spec
ific
pho
spho
glyc
erat
em
utas
e)N
P_0
0620
90
01
12
2N
o0
Pho
spho
inos
itid
e-3-
kina
se,
cata
lytic
,�
pol
ypep
tide
P36
955
01
11
33
Sig
nalc
onfid
ent
0P
igm
ent
epith
eliu
m-d
eriv
edfa
ctor
pre
curs
or(P
ED
F)(E
PC
-1)
P03
952
11
10
33
Sig
nal
0P
lasm
aka
llikr
ein
pre
curs
or(E
C3.
4.21
.34)
(pla
sma
pre
kalli
krei
n)(k
inin
ogen
in)
(Fle
tche
rfa
ctor
)P
0515
51
11
03
3S
igna
lcon
fiden
t0
Pla
sma
pro
teas
eC
1in
hib
itor
pre
curs
or(C
1In
h)(C
1Inh
)P
0275
31
11
03
3S
igna
lcon
fiden
t0
Pla
sma
retin
ol-b
ind
ing
pro
tein
pre
curs
or(P
RB
P)
(RB
P)
(PR
O22
22)
P05
154
11
10
33
Sig
nalc
onfid
ent
0P
lasm
ase
rine
pro
teas
ein
hib
itor
pre
curs
or(P
CI)
(pro
tein
Cin
hib
itor)
(pla
smin
ogen
activ
ator
inhi
bito
r-3)
(PA
I3)
(acr
osom
alse
rine
pro
teas
ein
hib
itor)
P05
121
11
00
22
Sig
nalc
onfid
ent
0P
lasm
inog
enac
tivat
orin
hib
itor-
1p
recu
rsor
(PA
I-1)
(end
othe
lialp
lasm
inog
enac
tivat
orin
hib
itor)
(PA
I)P
0074
71
12
15
4S
igna
l0
Pla
smin
og
enp
recu
rso
r(E
C3.
4.21
.7)
(co
ntai
nsan
gio
stat
in)
P02
775
10
01
22
Sig
nal
0P
late
let
bas
icp
rote
inp
recu
rsor
(PB
P)
(sm
alli
nduc
ible
cyto
kine
B7)
(CX
CL7
)P
0277
61
00
12
2S
igna
lcon
fiden
t0
Pla
tele
tfa
ctor
4p
recu
rsor
(PF-
4)(C
XC
L4)
(onc
osta
tinA
)(Ir
opla
ct)
P43
034
11
00
22
No
0P
late
let-
activ
atin
gfa
ctor
acet
ylhy
dro
lase
IB�
sub
unit
(EC
3.1.
1.47
)(P
AF
acet
ylhy
dro
lase
45kD
asu
bun
it)(P
AF-
AH
45-k
Da
sub
unit)
(PA
F-A
H�
)(P
AFA
H�
)(L
isse
ncep
haly
-1p
rote
in)
(LIS
-1)
NP
_006
197
00
11
22
Sig
nalc
onfid
ent
1P
late
let-
der
ived
grow
thfa
ctor
rece
pto
r�
pre
curs
or(H
omo
sap
iens
)N
P_0
0043
60
01
12
2N
o0
Ple
ctin
1,in
term
edia
tefil
amen
tb
ind
ing
pro
tein
500
kDa;
ple
ctin
1P
2074
21
11
03
3S
igna
lcon
fiden
t0
Pre
gnan
cyzo
nep
rote
inp
recu
rsor
P07
288
20
10
32
Sig
nalc
onfid
ent
0P
rost
ate-
spec
ific
antig
enp
recu
rsor
(EC
3.4.
21.7
7)(P
SA
)(�
-sem
inop
rote
in)
(kal
likre
in3)
(sem
enog
elas
e)(s
emin
in)
(P-3
0an
tigen
)P
0723
71
10
02
2S
igna
lcon
fiden
t0
Pro
tein
dis
ulfid
eis
omer
ase
pre
curs
or(P
DI)
(EC
5.3.
4.1)
(pro
lyl4
-hyd
roxy
lase
�su
bun
it)(c
ellu
lar
thyr
oid
horm
one
bin
din
gp
rote
in)
(P55
)N
P_0
0272
50
11
02
2N
o0
Pro
tein
kina
se,
cam
p-d
epen
den
t,re
gula
tory
,ty
pe
I,�
AA
B22
439
00
11
22
No
0P
rote
inty
rosi
nep
hosp
hata
se;
ptp
ase
(Hom
osa
pie
ns)
P00
734
11
21
54
Sig
nal
0P
roth
rom
bin
pre
curs
or
(EC
3.4.
21.5
)(c
oag
ulat
ion
fact
or
II)P
2261
42
00
13
2S
igna
lcon
fiden
t0
Put
ativ
ese
rum
amyl
oid
A-3
pro
tein
P04
626
10
10
22
Sig
nalc
onfid
ent
2R
ecep
tor
pro
tein
-tyr
osin
eki
nase
erb
b-2
pre
curs
or(E
C2.
7.1.
112)
(p18
5erb
b2)
(NE
Up
roto
-onc
ogen
e)(C
-erb
b-2
)(ty
rosi
neki
nase
-typ
ece
llsu
rfac
ere
cep
tor
HE
R2)
(MLN
19)
NP
_005
397
00
11
22
No
0R
ho-a
ssoc
iate
d,
coile
d-c
oilc
onta
inin
gp
rote
inki
nase
1;p
160r
ock
P49
908
10
10
22
Sig
nalc
onfid
ent
0S
elen
opro
tein
Pp
recu
rsor
(sep
)N
P_0
0620
60
11
02
2S
igna
lcon
fiden
t0
Ser
ine
(or
cyst
eine
)p
rote
inas
ein
hib
itor,
clad
eA
(�-1
)P
0278
71
11
14
4S
igna
lco
nfid
ent
0S
ero
tran
sfer
rin
pre
curs
or
(tra
nsfe
rrin
)(s
ider
op
hilin
)(�
-1-m
etal
bin
din
gg
lob
ulin
)(P
RO
1400
)P
0276
81
11
03
3S
igna
lcon
fiden
t0
Ser
umal
bum
inp
recu
rsor
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 319
TAB
LEII—
cont
inue
d
Acc
essi
onLi
t2D
EM
SLC
MS
1LC
MS
2To
tal_
acce
ssio
nsTo
tal_
sour
ces
Sig
nal
TMD
escr
iptio
n
P02
735
11
11
44
Sig
nalc
onf
iden
t0
Ser
umam
ylo
idA
pro
tein
pre
curs
or
(SA
A)
(co
ntai
nsam
ylo
idp
rote
inA
(am
ylo
idfib
ril
pro
tein
AA
)P
3554
21
11
03
3S
igna
l0
Ser
umam
yloi
dA
-4p
rote
inp
recu
rsor
(con
stitu
tivel
yex
pre
ssed
seru
mam
yloi
dA
pro
tein
)(C
-SA
A)
P02
743
11
00
22
Sig
nalc
onfid
ent
0S
erum
amyl
oid
P-c
omp
onen
tp
recu
rsor
(SA
P)
(9.5
S�
-1-g
lyco
pro
tein
)P
2716
91
11
03
3S
igna
l0
Ser
ump
arao
xona
se/a
ryle
ster
ase
1(E
C3.
1.1.
2)(E
C3.
1.8.
1)(P
ON
1)(s
erum
aryl
dia
kylp
hosp
hata
se1)
(A-e
ster
ase
1)(a
rom
atic
este
rase
1)(K
-45)
P04
278
11
00
22
Sig
nal
0S
exho
rmon
e-b
ind
ing
glob
ulin
pre
curs
or(S
HB
G)
(sex
ster
oid
-bin
din
gp
rote
in)
(SB
P)
(test
is-s
pec
ific
and
roge
n-b
ind
ing
pro
tein
)(A
BP
)P
0824
00
10
12
2P
ossi
ble
sign
al0
Sig
nalr
ecog
nitio
np
artic
lere
cep
tor
�su
bun
it(S
R-�
)(d
ocki
ngp
rote
in�
)(D
P-�
)A
AC
8318
30
01
12
2N
o0
Sim
ilar
tohu
man
hsgc
n1U
7770
0(P
ID:g
2282
576)
;si
mila
rto
yeas
tP
0948
61
10
02
2S
igna
lcon
fiden
t0
SP
AR
Cp
recu
rsor
(sec
rete
dp
rote
inac
idic
and
rich
incy
stei
ne)
(ost
eone
ctin
)(O
N)
(bas
emen
tm
emb
rane
pro
tein
BM
-40)
P29
508
20
10
32
No
0S
qua
mou
sce
llca
rcin
oma
antig
en1
(SC
CA
-1)
(pro
tein
T4-A
)N
P_0
0306
00
11
02
2N
o0
SW
I/S
NF-
rela
ted
mat
rix-a
ssoc
iate
dac
tin-d
epen
den
tre
gula
tor
ofP
0545
21
10
02
2S
igna
lcon
fiden
t0
Tetr
anec
tinp
recu
rsor
(TN
)(p
lasm
inog
en-k
ringl
e4
bin
din
gp
rote
in)
P07
996
11
01
33
Sig
nalc
onfid
ent
0Th
rom
bos
pon
din
1p
recu
rsor
P01
266
10
10
22
Sig
nalc
onfid
ent
0Th
yrog
lob
ulin
pre
curs
orP
0554
31
10
02
2S
igna
lcon
fiden
t0
Thyr
oxin
e-b
ind
ing
glob
ulin
pre
curs
or(T
4-b
ind
ing
glob
ulin
)P
0276
61
11
14
4S
igna
lco
nfid
ent
0T
rans
thyr
etin
pre
curs
or
(pre
alb
umin
)(T
BP
A)
(TT
R)
(AT
TR
)P
0747
71
01
02
2S
igna
lcon
fiden
t0
Tryp
sin
Ip
recu
rsor
(EC
3.4.
21.4
)(c
atio
nic
tryp
sino
gen)
P19
320
10
10
22
Sig
nalc
onfid
ent
1V
ascu
lar
cell
adhe
sion
pro
tein
1p
recu
rsor
(V-C
AM
1)(C
D10
6an
tigen
)(IN
CA
M-1
00)
P18
206
01
01
22
No
0V
incu
lin(m
etav
incu
lin)
P02
774
11
21
54
Sig
nalc
onf
iden
t0
Vit
amin
D-b
ind
ing
pro
tein
pre
curs
or
(DB
P)
(gro
up-s
pec
ific
com
po
nent
)(G
C-g
lob
ulin
)(V
DB
)P
0722
51
11
03
3S
igna
lcon
fiden
t0
Vita
min
K-d
epen
den
tp
rote
inS
pre
curs
orP
0407
01
10
02
2S
igna
lcon
fiden
t0
Vita
min
-Kd
epen
den
tp
rote
inC
pre
curs
or(E
C3.
4.21
.69)
(aut
opro
thro
mb
inIIA
)(a
ntic
oagu
lant
pro
tein
C)
(blo
odco
agul
atio
nfa
ctor
XIV
)P
0400
41
11
14
4S
igna
lco
nfid
ent
0V
itro
nect
inp
recu
rso
r(s
erum
spre
adin
gfa
cto
r)(S
-pro
tein
)(V
75)
(co
ntai
nsvi
tro
nect
inV
65su
bun
it;
vitr
one
ctin
V10
sub
unit
;so
mat
om
edin
B)
NP
_000
213
10
01
22
Sig
nalc
onfid
ent
2V
-kit
Har
dy-
Zuc
kerm
an4
felin
esa
rcom
avi
ralo
ncog
ene
hom
olog
P04
275
11
01
33
Sig
nalc
onfid
ent
0V
onW
illeb
rand
fact
orp
recu
rsor
(vw
f)P
2531
11
11
03
3S
igna
lcon
fiden
t0
Zin
c-�
-2-g
lyco
pro
tein
pre
curs
or(Z
n-�
-2-g
lyco
pro
tein
)(Z
n-�
-2-G
P)
Nonredundant Human Plasma Proteome
320 Molecular & Cellular Proteomics 3.4
tains predictions for a total of 8,817 proteins having 2–38 TMhelices, of which 99.7% have 2–21 (covered by the rangeshown). About 7% of proteins of H_Plasma_195 (19 of 195)contained TM segments (Fig. 4A), whereas 18% of 1,175H_Plasma_NR_v2 proteins had TM segments (Fig. 4B). Inboth cases, most proteins predicted to have a TM segmenthad only one. The distribution of multiple TM segments gen-erally reflected the shape obtained for HMPL, including peaksat 7 and 12 TM segments. The Lit and 2DEMS datasetscontained few transmembrane proteins, whereas the LCMSmethods found more, particularly at higher TM segment num-bers. LCMS1 detected more proteins with TM segments thanLCMS2, which concentrated on smaller proteins.
The relationship between signal sequence and TM segmentpredictions across H_Plasma_NR_v2 is shown in Table III.Proteins with TM segments make up 11% of “no” signalsequence proteins, and 20% of “signal confident” proteins.The former group (�TM -Sig) shows a much higher represen-tation of multiple TM segments than the latter (�TM �Sig),including a majority of the 7- and 12-TM entries (not all ex-tracellular proteins or domains have a “classical” signal se-quence). Sig� proteins were much more likely (15%) thanSig- proteins (4.5%) to have a single TM segment, typical ofmany receptors.
GO Component Assignments—Fig. 5 presents a compari-
son of four different subsets of the human proteome begin-ning with the whole human proteome and continuing throughthe H_Plasma_NR and H_plasma_195 sets to those 46 seenin all four of our datasets. In this progression, there is a steadyincrease in the proportion of extracellular proteins and a steadydecrease in the proportions of membrane, mitochondrial, andnuclear proteins. Several categories appear at a higher propor-tion in H_Plasma_NR_v2 than in either of the smaller versions ofthe plasma proteome or the whole human proteome: the kinesincomplex and lysosomal and cytoskeletal proteins.
Fig. 6 uses the GO component assignments for H_Plas-ma_NR_v2 to further compare the four input datasets. The Litproteins are primarily derived from extracellular (50% of thetotal), membrane, and cytoplasmic categories. 2DEMS ac-cessions showed fewer extracellular and similar membraneand cytoplasmic entries, with substantial increases in a seriesof other categories including kinesin complex and nuclear.The two LCMS methods showed similar GO component dis-tributions, displaying a smaller proportion of extracellular andcytoplasmic entries and more nuclear and mitochondrial en-tries than 2DEMS. None of the methods has a distributionclose to that for the whole human proteome (Fig. 5A), as theywould if the MS identifications were random.
FIG. 3. Signal sequence predictions for different data sources.Signal sequences as predicted by three algorithms (SignalP version2.0 neural net, HMM algorithms, and sigmask signal masking pro-gram) in the NR plasma proteome. The four outcomes (no, possiblesignal, signal, signal confident) correspond to 0, 1, 2, or 3 methodspredicting a signal sequence. The Lit dataset is represented in black,LCMS2 is dark gray, LCMS1 is light gray, and 2DEMS is white. A,signal sequence predictions for proteins repeated in multiple datasets(H_Plasma_195). B, signal sequence predictions for nonredundantproteins (H_Plasma_NR_v2). Bar segments should be compared forrelative size between sources and outcomes, but total stacked barheight is not relevant because of redundancy between the componentsources.
FIG. 4. Numbers of transmembrane segments as predicted byTMHMM. The Lit dataset is represented by black bars, LCMS2 bydark gray bars, LCMS1 by light gray bars, and 2DEMS by white bars.The line represents the TM distribution for all proteins contained in theHuman Membrane Protein Library. The left-hand scale is for theexperimental dataset (0–1 segments) and proteins in the HumanMembrane Protein Library. The right-hand scale is for the experimen-tal dataset (2–21 segments; �10� enlarged relative to the left scale inorder to show smaller values). A, signal sequence predictions forproteins repeated in multiple datasets (H_Plasma_195). B, signal se-quence predictions for nonredundant proteins (H_Plasma_NR_v2).
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 321
GO Function Assignments—A set of eight summary func-tion categories were used to analyze aspects of function overH_Plasma_NR_v2 and to compare the four input datasets(Fig. 7). Proteins with cytokine and hormone activities, whichare generally expected to be present at low abundance, werepredominantly found in Lit only, while Lit contained almostnone of the proteins with DNA-binding activity. Among theother categories reported, the experimental methods per-formed similarly except for underrepresentation of receptorand DNA-binding proteins in 2DEMS.
GO Process Assignments—Eight selected summary proc-ess categories were also used to analyze dataset differencesover H_Plasma_NR_v2. As shown in Fig. 8, a number of majorGO process categories were more evenly represented acrossthe four datasets, though the representation of Lit proteinsseems increased relative to the other sources, possibly due tomore extensive process annotation of these molecules.
DISCUSSION
We have assembled an enlarged list of proteins observed inhuman plasma by combining three published experimentaldatasets, generated by different proteomics approaches, witha large set of proteins drawn from individual published reportson serum or plasma. Of the combined total of 1,680 humanprotein accession numbers, 1,175 were judged to be distinctproteins (defining set H_Plasma_NR_v2) using a very conserv-ative set of criteria (95% sequence identity over 15 or moreamino acid subsequence). As shown in Fig. 2, the overlapbetween the four sets is surprisingly small, with only 46 pro-teins occurring in all four sets, and only 195 proteins (Table II)
occurring in more than one set (i.e. confirmed by identificationusing at least two approaches). This result suggests the in-volvement of one or more of several factors in limiting theoverlap between different views of the plasma proteome: 1)the methods used may be different enough to expose quitedifferent subsets of proteins (particularly because only a frac-tion of peptides observed are typically subjected to MS/MSand identified); 2) the samples used, though all human serumor plasma, may be different in the rank order of medium- andlow-abundance protein components; or 3) identifications gen-erated by some proteomics approaches could suffer frommore or less random errors associated with mistaken MS/MShits. We believe the first two factors are likely to account forthe relatively small overlap, while the third should be a minimalinfluence for several reasons. First, the stringency of MSidentification criteria used (described in detail in the originalpublications on each dataset) was reasonably high in eachcase, and false-positive identifications should represent lessthan 5–10% of the entries.2 Second, the distribution of anno-tation characteristics observed over the 1,175 plasma NR setis quite different from the human proteome as a whole (seedataset results of Fig. 6 versus the whole proteome in Fig. 5A),which suggests that random human hits are not dominatingthe accessions. Finally, the sets all show fairly similar levels ofdifference between every pair (Fig. 2), indicating that noneappears to be a major outlier. The observed differences be-tween approaches represented here thus suggest that addingadditional methods, or even repetition of these methods,would substantially expand the plasma proteome list from thetotal we obtained.
The set of 195 accessions that occur in at least two of thefour datasets represents a confirmed list of targets that shouldbe accessible for routine measurement by multiple proteom-ics technologies in human plasma or serum: these proteinshave been detected by at least two methods in differentlaboratories. This set actually comprises more than 195 ob-servable protein subunits because of the collapse of multipleforms into a single entry: for example haptoglobin’s � and �
chains collapse onto one primary gene product (cleaved aftersynthesis to yield the individual chains), and all Ig chains (�and � light chains, and �, �, �, , and � heavy chains) arelumped onto one accession because of their sequence simi-larity. The fact that a total of 96 Ig accessions occurred in theexperimental input data (14 in 2DEMS, 76 in LCMS1, and 6in LCMS2, in addition to the 11 Ig entries in Lit) indicatesthat a substantial number of heterogeneous Ig sequencesremain in plasma/serum samples even after the treatmentsused in these studies to remove antibodies prior to digestion/fractionation.
H_Plasma_195 contains most of the expected high- andmedium-abundance plasma components, but also contains anumber of proteins that might not ordinarily be expected to be
2 J. N. Adkins, manuscript in preparation.
TABLE IIIRelationship between signal sequence and TM segment predictions
over the 1,175 proteins of H_Plasma_NR_v2
Predicted TMsegments
Signal prediction
No Possiblesignal Signal Signal
confident Total
0 514 51 85 318 9681 26 15 18 61 1202 3 4 5 8 203 1 4 – 1 64 1 – 2 2 55 1 2 1 – 46 2 1 – 1 47 9 3 2 3 178 2 1 – 3 69 1 – – – 1
10 2 1 – – 311 2 1 – – 312 6 2 1 – 913 – 1 – – 114 1 2 – – 315 – – – – 016 – – – – 017 1 – – – 118 – – – – 019 2 – – – 220 1 – – – 121 1 – – – 1
Total 576 88 114 397
Nonredundant Human Plasma Proteome
322 Molecular & Cellular Proteomics 3.4
abundant enough to appear in a “common component” list.These include adiponectin (involved in the control of fat me-tabolism and insulin sensitivity), atrial natriuretic factor (a po-tent vasoactive substance synthesized in mammalian atriaand thought to play a key role in cardiovascular homeostasis),various cathepsins (D, L, S), centromere protein F (involved inchromosome segregation during mitosis), creatine kinase Mchain (an abundant muscle enzyme), glial fibrilary acid protein(distinguishes astrocytes from other glial cells), psoriasin(S-100 family, highly up-regulated in psoriatic epidermis), in-terferon-induced viral-resistance protein MxA (confers resist-ance to influenza virus and vesicular stomatitis virus), mela-noma-associated antigen p97 (a proposed cancer markeralso expressed in multiple normal tissues), mismatch repairprotein MSH2 (involved in postreplication mismatch repair,and whose defective forms are the cause of hereditary non-polyposis colorectal cancer type 1), oxygen-regulated protein(which plays a pivotal role in cytoprotective cellular mecha-nisms triggered by oxygen deprivation), peroxisome prolifera-tor-activated receptor binding protein (which plays a role in
transcriptional coactivation), prostate-specific antigen (a pro-tease involved in the liquefaction of the seminal coagulum,and one of the few successful cancer diagnostics), seleno-protein P (contains selenocyteines encoded by the opalcodon, UGA), signal recognition particle receptor � subunit(an integral membrane protein ensuring, in conjunction withsrp, the correct targeting of the nascent secretory proteins tothe endoplasmic reticulum membrane system), squamous cellcarcinoma antigen 1 (which may act as a protease inhibitor tomodulate the host immune response against tumor cells), andV-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene hom-olog (the receptor for stem cell factor). A number of theseproteins have obvious relevance to important disease mech-anisms, and thus are of potential diagnostic value. CathepsinS, centromere protein F, psoriasin, mismatch repair proteinMSH2, oxygen-regulated protein, and signal recognition par-ticle receptor � subunit did not occur in the Lit accession list,but were rather found via detection in two of the experimentaldatasets.
Two types of protein features (signal sequences and TM
FIG. 5. Major GO__component categories for proteome subsets. The proportions of proteins assigned to major component categories areshown for the human proteome (A); the nonredundant protein set H_Plasma_NR_v2 (B); the H_Plasma_195 proteins seen in more than twodatasets (C); and the 46 proteins seen in all four datasets (D).
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 323
domains) were predicted from the H_Plasma_NR_v2 se-quences. These two parameters are somewhat related, be-cause transmembrane, as well as secreted, proteins are likelyto contain signal sequences. In the 1,175 proteins of H_Plas-ma_NR_v2, approximately one-third were confidently pre-dicted to contain signal sequences (34% overall, with 32%having 0 or 1 TM segments) as compared with 19% contain-ing signal sequences over the whole human proteome (andonly 10% containing signal sequences with 0 or 1 TM seg-ments; R.F., unpublished observation). H_Plasma_NR_v2 isthus substantially (�3:1) enriched in a set of proteins havingsignal sequences and 0 or 1 TM segments (compared with thegenome), which is consistent with the presence of a largenumber of classical secreted proteins. However, becausemore than half of H_Plasma_NR_v2 proteins do not contain asignal sequence, the total representation of nonsecreted mol-ecules (presumably cellular constituents) is high.
In the full NR set of 1,175 proteins (H_Plasma_NR_v2),several major groups of proteins occur in patterns that sug-
gest interesting biases between our four data sources. Atleast 10 transcription factors were observed in the experimen-tal sets (each by only a single method), and none of thesewere found in the Lit accession set. Similarly, proteins GO-annotated with a DNA-binding function were essentially ab-sent from the Lit set. In contrast, only 4 of 39 cytokines andgrowth factors included were found in any of the experimentaldatasets (IL-6, IL-12A, ciliary neurotrophic factor, and FGF-12), while 37 occurred in the Lit set. These results suggest thatwhile the experimental proteomics methods were not sensi-tive enough to detect most cytokines and hormones, they diddetect important classes of proteins not detected in literaturereports using targeted assay methods. On a more global level,the distribution of GO_component assignments (Fig. 6) showssubstantial differences overall between the set of proteinsfound in a literature search versus the three experimentalproteomics technologies. Predicted features of protein se-quence also show major source-related differences. Moststriking is the fact that the Lit set was strongly biased toward
FIG. 6. Major GO_component categories in H_Plasma_NR_v2 by data source. Major component categories (same as Fig. 5) are shownfor Lit (A), LCMS1 (B), LCMS2 (C), and 2DEMS (D) data sources.
Nonredundant Human Plasma Proteome
324 Molecular & Cellular Proteomics 3.4
proteins that were confidently predicted to have signal se-quences (ratio of “signal confident” to “no” of 3.85), while2DEMS showed little preference (0.91) and the LCMS meth-ods showed a moderate (2 to 1) bias toward proteins withoutsignal sequences (ratios of 0.49 and 0.44). The strong bias ofthe Lit set toward signal sequences is likely due to the greaterease with which these more soluble proteins can be isolated
and studied by biochemical techniques. Similarly, the differ-ence between 2DEMS and LCMS may be due to the failure ofmany less-soluble proteins to focus in the first (isoelectricfocusing) dimension of the 2DE procedure (e.g. intact mem-brane or very large proteins), or else to the presence ofnumerous isoforms that divide the protein among members ofa charge train, and thus decrease the limit of detection (e.g.heavily glycosylated extracellular domains cleaved frommembrane proteins). By placing fewer requirements on thebehavior of sample proteins prior to digestion, and by provid-ing identifications based on a few soluble peptides, the MS-based techniques provided a significantly less biased, thoughnot necessarily more complete, view of the plasma proteome.
Because the four input protein sets were of similar size, it isclear that the literature, viewed as an historical summary ofresearch on proteins in plasma, shows a bias toward secretedproteins and against investigation of cellular proteins in solu-tion in blood. This effect cannot be due to detection sensitivityalone, because the low-abundance cytokines, generally notdetected by the experimental proteomics methods, are ac-cessible via immunoassay and widely reported in the litera-ture. The bias may instead be due to a general skepticism thatdetectable amounts of many cellular proteins are being re-leased into plasma (absent some major cause of tissue dam-age), or to a view that cellular protein release, if it occurred,would not be especially informative. The present results, builton experimental studies of multiple groups, demonstrate thatmany (perhaps all) cellular proteins are present in plasma. Thedemonstrated utility of cardiac muscle protein markers asserum diagnostics for myocardial infraction provides a per-suasive argument that many may have diagnostic use.
The next major challenge thus becomes the systematicexploration of protein abundance and structural modificationin relation to disease, normal physiological processes, andtreatment effects. In a sense this shift can be seen as analo-gous to the current evolution in pharmaceutical target selec-tion: the genome has provided a wealth (in practical terms anoverabundance) of previously unknown therapeutic targets,creating a major challenge in selecting those that are “drug-gable” and specifically linked to disease mechanisms. A shift,in other words, from protein discovery to target validation. Inthe context of diagnostics based on proteins in blood, we nowhave in H_Plasma_NR_v2, and a growing body of other ex-perimental data, a substantial set of candidate disease mark-ers that can be detected in plasma. While these will continueto be supplemented by discovery techniques (22), the stage isset for systematic efforts to validate disease markers for near-term application in clinical trials, medium-term use in diseasedetection, staging, and therapy selection, and long-term usein population screening. While some of these proteins havebeen examined individually as potential markers and found tohave low sensitivity or specificity as individual tests, growingevidence indicates that these limitations may be overcomeusing fingerprints of change across panels of proteins that
FIG. 7. Major categories of GO_function assignments overH_Plasma_NR_v2 by data source. Cellular function categories wereextracted from GO_function annotations. The Lit dataset is repre-sented by black bars, LCMS2 by dark gray bars, LCMS1 by light graybars, and 2DEMS by white bars.
FIG. 8. Major GO_process categories over H_Plasma_NR_v2 bydata source. The Lit dataset is represented by black bars, LCMS2 bydark gray bars, LCMS1 by light gray bars, and 2DEMS by white bars.
Nonredundant Human Plasma Proteome
Molecular & Cellular Proteomics 3.4 325
together better represent patient status. Thus even the pres-ent H_Plasma_NR_v2 offers an abundance of candidates de-serving of measurement in selected clinical sample sets. Mak-ing such measurements, accurately and on a large scalepresents a series of technical challenges likely to requiresubstantial efforts for much of the next decade. The results ofsuch a “targeted” proteomics effort can transform diagnos-tics, improve therapy, and lead to substantial and neededimprovements in the economics of healthcare.
* The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be herebymarked “advertisement” in accordance with 18 U.S.C. Section 1734solely to indicate this fact.
□S The on-line version of this article (available at http://www.mcponline.org) contains supplemental materials.
§ To whom correspondence should be addressed: The PlasmaProteome Institute, P.O. Box 53450, Washington DC 20009-3450.Tel.: 301-72811451; Fax: 202-234-9175; E-mail: [email protected].
� Current address: The Institute for Genomic Research, 9712 Med-ical Center Drive, Rockville, MD 20850.
REFERENCES
1. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome:History, character, and diagnostic prospects. Mol. Cell. Proteomics 1,845–867
2. Anderson, L., and Anderson, N. G. (1977) High resolution two-dimensionalelectrophoresis of human plasma proteins. Proc. Natl. Acad. Sci. U. S. A.74, 5421–5425
3. Hughes, G. J., Frutiger, S., Paquet, N., Ravier, F, Pasquali, C., Sanchez,J. C., James, R., Tissot, J. D., Bjellqvist, B., and Hochstrasser, D. F.(1992) Plasma protein map: An update by microsequencing. Electro-phoresis 13, 707–714
4. Pieper, R., Su, Q., Gatlin, C. L., Huang, S. T., Anderson, N. L., and Steiner,S. (2003) Multi-component immunoaffinity subtraction chromatography:An innovative step towards a comprehensive survey of the humanplasma proteome. Proteomics 3, 422–432
5. Adkins, J. N., Varnum, S. M., Auberry, K. J., Moore, R. J., Angell, N. H.,Smith, R. D., Springer, D. L., and Pounds, J. G. (2002) Toward a humanblood serum proteome: Analysis by multidimensional separation coupledwith mass spectrometry. Mol. Cell. Proteomics 1, 947–955
6. Tirumalai, R. S., Chan, K. C., Prieto, D. A., Issaq, H. J., Conrads, T. P., andVeenstra, T. D. (2003) Characterization of the low molecular weighthuman serum proteome. Mol. Cell. Proteomics 2, 1096–1103
7. Pieper, R., Gatlin, C. L., Makusky, A. J., Russo, P. S., Schatz, C. R., Miller,S. S., Su, Q., McGrath, A. M., Estock, M. A., Parmar, P. P., Zhao, M.,Huang, S. T., Zhou, J., Wang, F., Esquer-Blasco, R., Anderson, N. L.,Taylor, J., and Steiner, S. (2003) The human serum proteome: Display ofnearly 3700 chromatographically separated protein spots on two-dimen-sional electrophoresis gels and identification of 325 distinct proteins.Proteomics 3, 1345–1364
8. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990)Basic local alignment search tool. J. Mol. Biol. 215, 403–410
9. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller,W., and Lipman, D. J. (1997) Gapped blast and psi-blast: A new gener-ation of protein database search programs. Nucleic Acids Res. 25,3389–3402
10. Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997) Identifi-cation of prokaryotic and eukaryotic signal peptides and prediction oftheir cleavage sites. Protein Eng. 10, 1–6
11. Swindells, M., Rae, M., Pearce, M., Moodie, S., Miller, R., and Leach, P.(2002) Application of high-throughput computing in bioinformatics. Phi-los. Transact. Ser. A. Math. Phys. Eng. Sci. 360, 1179–1189
12. Michalovich, D., Overington, J., and Fagan, R. (2002) Protein sequenceanalysis in silico: Application of structure-based bioinformatics togenomic initiatives. Curr. Opin. Pharmacol. 2, 574–580
13. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001)Predicting transmembrane protein topology with a hidden Markov mod-el: Application to complete genomes. J. Mol. Biol. 305, 567–580
14. Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: Astructural classification of proteins database for the investigation ofsequences and structures. J. Mol. Biol. 247, 536–540
15. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S. R.,Griffiths-Jones, S., Howe, K. L., Marshall, M., and Sonnhammer, E. L.(2002) The pfam protein families database. Nucleic Acids Res. 30,276–280
16. Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N.,Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A.,and Zygouri, C. (2003) Prints and its automatic supplement, preprints.Nucleic Acids Res. 31, 400–402
17. Sigrist, C. J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M.,Bairoch, A., and Bucher, P. (2002) PROSITE: A documented databaseusing patterns and profiles as motif descriptors. Brief Bioinform. 3,265–274
18. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A.,Gasteiger, E., Martin, M. J., Michoud, K., O’Donovan, C., Phan, I., Pil-bout, S., and Schneider, M. (2003) The Swiss-Prot protein knowledge-base and its supplement TrEMBL in 2003. Nucleic Acids Res. 31,365–370
19. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M.,Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill,D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson,J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontol-ogy: Tool for the unification of biology. The Gene Ontology Consortium.Nat. Genet. 25, 25–29
20. Petersen, S. V., Valnickova, Z., and Enghild, J. J. (2003) Pigment-epitheli-um-derived factor (pedf) occurs at a physiologically relevant concentra-tion in human blood: Purification and characterization. Biochem. J. 374,199–206
21. Toothaker, L. E., Gonzalez, D. A., Tung, N., Lemons, R. S., Le Beau, M. M.,Arnaout, M. A., Clayton, L. K., and Tenen, D. G. (1991) Cellular myosinheavy chain in human leukocytes: Isolation of 5� cDNA clones, charac-terization of the protein, chromosomal localization, and up-regulationduring myeloid differentiation. Blood 78, 1826–1833
22. Liotta, L. A., Ferrari, M., and Petricoin, E. (2003) Clinical proteomics: Writtenin blood. Nature 425, 905
Nonredundant Human Plasma Proteome
326 Molecular & Cellular Proteomics 3.4