An overview of the sequence features of N- and C-terminal segments of the human chemokine receptors

Post on 29-Apr-2023

0 views 0 download

Transcript of An overview of the sequence features of N- and C-terminal segments of the human chemokine receptors

Cytokine xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Cytokine

journal homepage: www.journals .e lsev ier .com/cytokine

An overview of the sequence features of N- and C-terminal segmentsof the human chemokine receptors

http://dx.doi.org/10.1016/j.cyto.2014.07.2571043-4666/� 2014 Elsevier Ltd. All rights reserved.

⇑ Corresponding author. Address: Centro Ricerche Oncologiche di Mercogliano(CROM), Istituto Nazionale per lo studio e la cura dei tumori ‘‘Fondazione G.Pascale’’ – IRCCS, Via Ammiraglio Bianco, 83013 Mercogliano, AV, Italy. Tel.: +390825 1911730; fax: +39 0825 1911705.

E-mail addresses: s.costantini@istitutotumori.na.it, susan.costantini@cro-m.eu(S. Costantini).

Please cite this article in press as: Raucci R et al. An overview of the sequence features of N- and C-terminal segments of the human chemokine recCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

Raffaele Raucci a,b, Susan Costantini c,⇑, Giuseppe Castello c, Giovanni Colonna a,b

a Doctorate in Computational Biology, Second University of Naples, Naples, Italyb Department of Biochemistry, Biophysics and General Pathology, Second University of Naples, Naples, Italyc Centro Ricerche Oncologiche di Mercogliano, Istituto Nazionale per lo studio e la cura dei tumori ‘‘Fondazione G. Pascale’’ – IRCCS, Italy

a r t i c l e i n f o

Article history:Received 27 April 2014Received in revised form 21 June 2014Accepted 29 July 2014Available online xxxx

Keywords:Intrinsic protein disorderHuman chemokine receptorsChemokine receptors evolution,PolyelectrolytesConformational ensemble

a b s t r a c t

Chemokine receptors play a crucial role in the cellular signaling enrolling extracellular ligands chemotac-tic proteins which recruit immune cells. They possess seven trans-membrane helices, an extracellularN-terminal region with three extracellular hydrophilic loops being important for search and recognitionof specific ligand(s), and an intracellular C-terminal region with three intracellular loops that coupleG-proteins. Although the functional aspects of the terminal segments of the extra-and intra-cellular Gproteins are universally identified, the molecular basis on which they rest are still unclear because theyare not definable by means of X-rays due to their high mobility and are not easy to study in the mem-brane. The purpose of this work is to define which physical–chemical properties of the terminal segmentsof the human chemokine receptors are at the basis of their functional mechanisms. Therefore, we haveevaluated their physical–chemical properties in terms of amino acid composition, local flexibility,disorder propensity, net charge distribution and putative sites of post-translational modifications.

Our results support the conclusion that all 19 C-terminal and N-terminal segments of human chemo-kine receptors are very flexible due to the systematic presence of intrinsic disorder. Although, the pur-pose of this plasticity clearly appears that of controlling and modulating the binding of ligands, weprovide evidence that the overlap of linearly charged stretches, intrinsic disorder and post-translationalmodification sites, consistently found in these motives, is a necessary feature to exert the function. Therole of the intrinsic disorder has been discussed considering the structural information coming fromintrinsically disordered model compounds which support the view that the chemokine terminals haveto be considered as strong polyampholytes or polyelectrolytes where conformational ensembles andstructural transitions between them are modulated by charge fraction variations. Also the role ofpost-translational modifications has been found coherent with this view because, changing the chargefraction, they guide structural transitions between ensembles. Moreover, we have also considered ourresults from an evolutionary point of view in order to understand if the features found in humans werealso present in other species. Our data evidenced that the structural features of the human terminals ofthe chemokine receptors were shared and evolutionarily conserved particularly among mammals. Thismeans that the various organisms not only tolerate but select intrinsic disorder for the terminal regionsof their receptors, reflecting constraints that point to molecular recognition.

In conclusion the terminal segments of chemokine receptors must be considered as strong polyamph-olytes where the charge fraction variations induced by post-translational modifications are the drivingphysico-chemical feature able to adapt the conformations of the terminal segments to their functions.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Cell signaling is part of a complex system that governs basiccellular activities and coordinates their actions. Cells receivesignals from external molecules and the specificity of their interac-tion drives the signal transduction. Among the most common typesof receptors that researchers work upon, as drug targets, there are

eptors.

2 R. Raucci et al. / Cytokine xxx (2014) xxx–xxx

the G protein-coupled receptors (GPCRs). It is an area of intensescientific interest that has tremendous biological and pharmaceu-tical relevance because of the role they play in so many cell typesas ‘antenna and transmitter’ of extra-cellular information into sig-nals which across membrane into the cell interior, where they eli-cit responses via the process of signal transduction [1]. Chemokinereceptors belong to a distinct subfamily of the rhodopsin-like Gprotein-coupled receptors subdivided in four different groups:seven CXC receptors (CXCR), ten CC receptors (CCR), one CX3Creceptor (CX3CR), and one XC receptor (XCR) which bind four dis-tinct subfamilies of chemokines. In details, each receptor possessesseven trans-membrane helices, an extracellular N-terminal regionwith three extracellular hydrophilic loops, and an intracellular C-terminal region with three intracellular loops. The first and secondextracellular loops are linked together by disulfide bond betweentwo conserved cysteine residues. The N-terminal region is struc-turally important for search and recognition of specific ligand(s)whereas the intracellular C-terminal region couples G-proteins,and this mechanism is implicated in the receptor signaling trans-duction [2]. In humans we find approximately 50 chemokines,but only 19 receptors because some of them show pleiotropy,i.e., the ability to bind multiple chemokines with a ‘high specificity’[3]. In this context, there is a mounting interest for the importanceof proteins with intrinsically disordered segments in cell signalingand regulation because these segments are often involved in medi-ating the molecular recognition. In the last decade, many proteinswith natively disordered structure called Intrinsically DisorderedProteins (IDPs) have been found to be involved in numerous keybiological processes including cell cycle control, regulation, signal-ing as well as binding between biological macromolecules [4].Intrinsic disorder is dependent on the quality of amino acids con-stituting the protein [5–7] and its structural plasticity, determinedby the abundance of residues inherently flexible because small,polar and charged, represents a major functional advantagebecause it confers the ability to recognize numerous and differentligands, including proteins, membranes, and nucleic acids [8]. Fur-ther studies have also shown that, for a given protein, a flexibleintrinsically disordered region can fold differently as a conse-quence of its binding to different partners as well as that differentsequences from different proteins can recognize the same bindingsite on a specific partner due to their flexibility [4]. Moreover IDPsor their intrinsically disordered regions (IDRs) are criticallyinvolved in receptor signaling, where they modulate the nativeand functional state of many signaling proteins to favor theirpost-translational modifications which occurs predominantlywithin IDRs [9,10]. However, a limit to the understanding of themolecular mechanisms of biological functions in which areinvolved IDPs is caused by the lack of knowledge of the physicalbasis on which these mechanisms rely [11,12]. Recent data,obtained borrowing concepts from polymer physics [13] and thephysics of spin glasses, consider IDPs as strong polyampholytes[14], and show that this approach is much closer to the physicalreality in these proteins [15].

In this article we focused our attention on the terminal regions(N- and C-term) of human chemokine receptors because a largeamount of structural information is spread in the literature suchas presence of charges, flexibility or post-translational modifica-tions [16]. In details, we focused on the physical–chemical proper-ties of the N-terminal and C-terminal receptor sequences that areat the basis of their biological function.

Our results support the conclusion that the terminal segmentsof the human chemokine receptors are no longer to be consideredas week polyampholytes, but as polyelectrolytes. The presence ofintrinsic disorder, with its great structural flexibility, promotesthe conformational ensembles characteristic of polyampholytes,whose structural transitions (coil/collapsed globule) are governed

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

by variations in the charged fractions brought about by post-trans-lational modifications of residues located in the charged stretchesof the same disordered regions. It is the concomitant presence ofintrinsic disorder and post-translationally modified residues inthe same charged segment to represent the structural organizationon which rest the molecular mechanism that characterize thefunction of these proteins. This reasonably suggests that the chargedistribution of IDRs can serve to modulate their conformationsthrough post-translational modifications. Moreover, this structuralfeature found for the human chemokine receptors has been foundevolutionarily conserved, in particular among mammals, empha-sizing its functional importance.

2. Methods

2.1. Sequence analysis

The extracellular N-terminal and intracellular C-terminalregions of 19 human chemokine receptors and in other relativeorganisms were obtained from UniProt database. In details, theyare subdivided in four different families: seven CXC receptors(CXCR), ten CC receptors (CCR), one CX3C receptors (CX3CR) andone XC receptors (XCR), which correspond to the four distinct sub-families of chemokines they bind. In particular, we have evaluatedby ProtParam program [17] the percentages of charged residues(i.e. Lys, Arg, Asp, Glu), and analyzed the presence of chargedstretches along the sequences considering only stretches madeby at least two consecutive positive or negative charges, withouttaking into account the intervening uncharged residues. In thisway, we can easily point out those segments of the sequence onwhich there is a widespread distribution of the same charge. Jpredweb server was used to predict secondary structure using a neuralnetwork called Jnet [18]. The prediction is the definition of eachamino acid residue into either alpha helix (‘H’), beta sheet (‘E’) orrandom coil (‘–’) secondary structures.

2.2. Local flexibility

The local flexibility was evaluated using the Ragone et al.algorithm [19]. The algorithm uses the hydrophobicity–volumeproduct (H * V) of amino acids to calculate the flexibility plot byusing a five residues shifting window. The normalized HV valuesrange from 0 to about 7000. A threshold is present at HV value of1330 [19]. Residues with lower values of HV (below 1330) arethe most flexible. The flexibility values are calculated by a shiftingwindow (5–7 residues). The locally flexible segments will corre-spond to minima in the plot. This algorithm, although simple, isvery efficient in the calculation of the local flexibility with an accu-racy of 65%, at level of the best ones [20].

3. Disorder prediction

The disorder propensity in all the sequences was predicted byusing GensilicoMetaDisorder Service (http://iimcb.genesilico.pl/metadisorder/) running a wide range of disorder predictionsoftware, including DISEMBL [21], VSL2 [22], iPDA [23], DISPro[24], GlobPlot2 [25], IUPred [26], PDISORDER (Softberry, Inc.), POO-DLE-S [27], POODLE-L [28], PrDOS [29], Spritz [30] and RONN [31].

As an output you will get metapredictions based on optimizedresults from primary methods. Methods were carefully bench-marked and validated on a dataset and during blind tests CASP8and CASP9.

We have also applied the binary classification method that usescharge-hydrophobicity plots (CH-plots) [32], where ordered andunordered proteins are plotted in CH-space, and a linear boundary

eatures of N- and C-terminal segments of the human chemokine receptors.

R. Raucci et al. / Cytokine xxx (2014) xxx–xxx 3

separates them. In particular, the border between ‘‘natively unor-dered’’ and ‘‘native ordered’’ proteins was calculated from the dataas: <R> = 2.785<H>�1.151 where <H> and <R> were the meanhydrophobicity and the mean net charge of the protein, respec-tively. The hydrophobicity of each amino acid sequence was calcu-lated by the Kyte and Doolite approximation, by using a windowssize of 5 residues [33]. The mean hydrophobicity is defined as thesum of the normalized hydrophobicities of all residues divided bythe number of residues. The mean net charge is defined as the netcharge at pH 7.0 divided by the total number of residues.

3.1. Post-translational site predictions

The phosphorylation sites have been predicted for human che-mokine receptors according to the neural network based NetPhosServer [34]. In addition, we have also used DISPHOS (DISorder-enhanced PHOSphorylation predictor) that has an improvedaccuracy for phosphorylation sites located in protein region withhigh propensity to disorder [10]. The evaluation on the possiblekinases that can phosphorylate N-term and C-term was performedon the sites, predicted earlier, which were conserved at least in 80%of sequences of mammals, using the GPS software [35] andNetphosK [36], and reporting the kinases which result from theconsensus of the two softwares.

The sulfation and glycosylation sites predictions were performedby Sulfinator tool [37], and by NetNGlyc andNetOGlyc servers,respectively [38].

3.2. Analysis of charge distribution

The charge distribution analysis according to Das and Pappu[15], was performed on the sequence of N- and C-terminals ofthe human chemokine receptors. This analysis treats the sequenceas polyampholytes since the sequences include positively as wellas negatively charged residues but the fraction of the charged res-idue discriminates between weak and strong polyampholytes andthe conformational propensity exerted by their ensembles. Theparameters calculated for describing terminal sequences includeFCR, fraction of charged residues, NCPR, net charge per residue,which are defined as FCR = (f+ + f�) and NCPR = |f+ � f�|, where f+

and f� denote the fraction of positive and negative charges, respec-tively. To classify the single sequence into the proper region of theDiagram of State for IDPs [15], we have adopted the values of thefollowing statistics for different regions based on sequence drawnfrom DisProt database [15]: Region 1, Weak Polyampholytes andPolyelectrolytes: FCR < 0.25 and NCPR < 0.25 with propensity forensemble of Globule and Tadpole; Region 2, Boundary separating1 and 3: 0.25 6 FCR 6 0.35 and NCPR 6 0.35; Region 3, StrongPolyampholytes: FCR > 0.35 and NCPR 6 0.35, with propensity forensemble of Coils, Hairpins, Chimeras; Region 4, Strong Polyelec-trolytes: FCR > 0.35 and NCPR > 0.35, with propensity for ensembleof Swollen Coils.

4. Results

4.1. Sequence analysis of N- and C-terminal regions

Since our aim was to evaluate the sequence features of theterminal segments, we have analyzed the physical–chemical char-acteristics of these regions. In particular, we evaluated amino acidcomposition, total percentages of charged, aromatic, small andpolar residues, net-charge, local flexibility, propensity to disorder,presence of extended charged stretches and post-translationalmodification sites for each of receptors.

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

N-terminal sequences show a large diversity in length but theircompositions show remarkable similarities. Their length consistsof a number of amino acids ranging from the 31 residues of CXCR1and CX3CR1 to 55 of CXCR5 (Table 1). Their composition is made ofthe strong presence of small and polar amino acids with a highpercentage of charged residues and a low percentage of aromaticresidues. Obviously, the high presence of negatively charged resi-dues gives a net negative charge to these regions, ranging from�1 for CCR2 and CCR5, to �10 for CXCR2 and CXCR3 (Table 1).The analysis of the C-terminal regions shows that their overalllengths are in average greater in respect to the N-terminal ones,with a number of residues ranging from 42 of CXCR1 and XCR1to 65 of CCR2. They too have a high content of charged residuesif compared to the very low content in aromatics (see Table 1).In particular, the C-term of all the receptors consists mostly of pos-itively charged residues, in contrast to N-terms, rich in negativelycharged residues. However, on overall view shows that while theamount of polar residues is comparable within N-terms, that ofsmall residues is slightly lower (Table 1). Obviously, the elevatedpresence of positively charged residues results in a net positivecharge ranging between +2.3 of CX3CR1 and +5.3 of CXCR4.CCR4, instead, is the only receptor that has the C-terminal with anet charge equal to zero (Table 1).

Moreover, according to Das and Pappu [15], we used the netcharge per residue, that modulates the conformational ensemblesand permits to predict the distinction into globules vs. swollencoils. As visible in Table 2A and 2B the majority of the N-terminaland C-terminal segments are located in the borderline region 2 ofthe state diagram [39] between the region 1, characterized by thepresence of globules and the region 3 (coils).

In general these sequence features indicate that N-terminal andC-terminal segments are strong polyampholytes due to the highpresence of charged residues, but the very low amount of hydro-phobic residues, essential for the formation of an organized core,is a molecular signature that suggests the presence of intrinsicdisorder [40].

4.2. Disorder and Local flexibility prediction of N- and C-terminalregions

We have evaluated the disorder propensity in all the sequencesby using GensilicoMetaDisorder Service (http://iimcb.genesilico.pl/metadisorder/). In details, we found (Fig. 1) that all the sequencesof the C-terminal regions presented regions of high disorder pro-pensity with more than 50% of the residues predicted as intrinsi-cally disordered, while only 10 out of 19 sequences in N-terminalregions were found equally disordered. However, in the remaining9 N-terminal sequences the percentage of predicted disorderedresidues is still consistent, between 27% and 48%. Moreover, wecomputed also the charge-hydrophobicity plot (CH-plot) (Fig. 2)for the terminal regions [32]. Results show that, according to thiscriterion, 16 out of 19 C-terminals and 10 out of 19 N-terminalshave low overall hydrophobicity and large net charge, a structuralfeature of intrinsically disordered proteins [41]. In overall thesedata are in excellent agreement between them and evidence alsothat the C-terminal regions have higher disorder propensitycompared to N-terminal ones. This structural feature is certainlycorrelated to the biological function of the C-terminal sequenceof these receptors that couple G-proteins and are involved into sig-nal transduction according to some papers on disordered proteins[5,42,43]. To support these findings we have also predicted thesecondary structures for the N- and C-term regions by JPred(Table S1). Results evidenced very high percentages of residuespredicted as unstructured with only few residues in helices orstrands, particularly in C-terminal segments. Moreover, we evalu-ated the local flexibility of the 19 terminal segments by using the

eatures of N- and C-terminal segments of the human chemokine receptors.

Table 1The table shows the total residue number, net charge, the percentage of aromatic, small, polar and charged residues and that of negatively and positively charged residues for N-terminal (A) and C-terminal (B) regions.

N-TERM

Total residue number Net Charge Aromatic (%) Small (%) Polar (%) Charged (%) Asp + Glu (%) Arg + Lys (%)

ACXCR1 39 �8 10 56 51 26 23 3CXCR2 48 �10 15 50 52 29 25 4CXCR3 53 �10 9 62 53 25 21 2CXCR4 38 �6 13 50 58 32 24 8CXCR5 55 �8.9 11 49 44 25 20 4CXCR6 32 �6.7 19 41 66 38 25 3CXCR7 40 �5 10 65 45 20 15 3CCR1 34 �6 12 65 59 29 24 6CCR2 42 �1 14 52 55 29 14 12CCR3 34 �4 9 62 50 24 18 6CCR4 39 �4 13 51 49 26 18 8CCR5 30 �1 13 50 47 17 10 7CCR6 47 �6 15 53 55 21 17 4CCR7 35 �4 14 57 60 34 23 11CCR8 35 �5 14 60 49 20 17 3CCR9 53 �6 17 60 53 21 15 4CCR10 52 �6.9 12 58 46 23 17 4XCR1 31 �4 19 55 52 13 13 0CX3CR1 31 �8 16 55 42 26 26 0

BCXCR1 42 4.3 10 50 52 21 2 12CXCR2 45 4.3 7 49 51 24 4 13CXCR3 47 4 9 45 55 26 9 17CXCR4 50 5.3 6 58 62 24 4 14CXCR5 47 2.9 9 55 53 19 6 13CXCR6 49 3.1 14 47 55 27 8 14CXCR7 43 3 14 40 56 30 12 19CCR1 50 4.3 12 40 52 34 10 18CCR2 65 7 3 54 42 28 8 18CCR3 50 3.4 14 32 50 34 10 16CCR4 52 0 13 42 48 23 10 10CCR5 51 3 14 39 51 27 10 16CCR6 53 4.9 15 49 58 28 9 19CCR7 47 3 11 43 62 30 11 17CCR8 51 3 10 39 65 29 10 16CCR9 48 4 6 44 54 25 8 17CCR10 49 7 4 55 53 29 6 20XCR1 42 4.2 19 45 38 21 2 12CX3CR1 58 2.3 12 48 52 28 9 12

4 R. Raucci et al. / Cytokine xxx (2014) xxx–xxx

flexibility plot. Fig. S1 shows the presence of numerous minimawith low values of H * V (high local flexibility) both in the N-termand C-term regions. These minima represent the points of thesequence around which is pivoted the structural mobility thatallows the exploration of the conformational space to the wholestructure. We have also added in the graphs, disorder, stretchesof charged amino acids, sites of sulfation and those of phosphory-lation (see the legend to Fig. S1 for details). In the N-terminalsegments, the disorder is basically confined in the outer segmentsdirected toward the solvent, as shown by darker colors, while, inthe C-terminal segments, it is always confined in the outer seg-ments toward the cytoplasm and generally much more abundantand widespread. From graphs, we can see that local flexibilityand intrinsically disordered segment do not always coincide, evenif both should contribute to the overall motions. Evidently disorderand local flexibility modulate different properties even if they mustcollaborate. To have more details on the role of specific residues,we have evaluated the possible sites of post-translational modifi-cations because their presence can change the net charge.

4.3. Analysis for post-translational modifications of N- and C-terminalregions

The recent review of Szpakowska et al. [44] reports the possibleimportance of sulfation of specific tyrosines for high affinity

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

binding of chemokines [44]. Although the presence of sulfatedtyrosines has been experimentally demonstrated for only fivehuman chemokine receptors (CCR2b, CCR5, CXCR3, CXCR4,CX3CR1), the prediction of protein tyrosine sulfation sites remainsproblematic due to several consensus features required for TPSTsactivity [44]. Moreover, the sulfation is a post-translational modi-fication that takes place only in the Golgi apparatus, thus thesulfated protein reaches the membrane already changed whilethe phosphorylation is a response to a stimulus triggering a versa-tile and reversible mechanism [45]. While the presence of phos-phorylation is well known for the C-terminal segments [46],almost no studies have been conducted on the N-terminalsequences. Only Schumacher et al. experimentally found in a lym-phoma cell line (SU-DHL1 of T lymphocyte) that the Y42 of CCR1was phosphorylated [47]. This means that the extra-cellular struc-tures of human receptors can be phosphorylated. The computa-tional analysis of N- and C-terminal sequences has evidencedthat all the chemokine receptors always show in the terminalregions some residues of serine, threonine or tyrosine, some ofwhich are predicted to be specific target for phosphorylation andfound almost always localized in disordered regions. The analysisof the C-terminal sequences shows the diffuse presence of posi-tively charged residues with numerous charged stretches particu-larly in sequence sides close to the membrane; moreover we cansometime find small negative stretches located at the end of the

eatures of N- and C-terminal segments of the human chemokine receptors.

Table 2Analysis of (A) N-terminal and (B) C-terminal sequences of human chemokinereceptors according to the Diagram of State for IDPs [15]. The lines reported in red,show a simulation of IDP state for CXCR3 and CCR3 considering the presence of twophosphorylations each. As it can be seen the two conformations previously indicatedas ‘‘globule’’ are changed in ‘‘boundary’’ after the simulations.

Receptor Length f+ f� (f+) � (f�) FCR NCPR IDP state

ACCR1 32 0.06 0.28 �0.22 0.34 0.22 BoundaryCCR2 45 0.13 0.16 �0.03 0.29 0.03 BoundaryCCR3 34 0.06 0.17 �0.11 0.23 0.11 GlobuleCCR4 39 0.08 0.18 �0.10 0.26 0.10 GlobuleCCR5 30 0.06 0.10 �0.04 0.16 0.04 GlobuleCCR6 47 0.04 0.17 �0.13 0.21 0.13 GlobuleCCR7 35 0.11 0.23 �0.12 0.34 0.12 BoundaryCCR8 35 0.03 0.17 �0.14 0.20 0.14 GlobuleCCR9 53 0.06 0.15 �0.09 0.21 0.09 GlobuleCCR10 52 0.04 0.17 �0.13 0.21 0.21 GlobuleCX3C1 31 0 0.26 �0.26 0.26 0.26 BoundaryCXCR1 39 0.02 0.23 �0.21 0.25 0.21 BoundaryCXCR2 48 0.04 0.25 �0.21 0.29 0.21 BoundaryCXCR3 53 0.04 0.21 �0.17 0.25 0.17 GlobuleCXCR4 39 0.08 0.23 �0.15 0.31 0.15 BoundaryCXCR5 55 0.05 0.22 �0.17 0.27 0.17 BoundaryCXCR6 32 0.12 0.25 �0.13 0.37 0.13 CoilCXCR7 40 0.05 0.15 �0.10 0.20 0.10 GlobuleXCR1 31 0 0.13 �0.13 0.13 0.13 GlobuleCXCR3 + 2P 48 0.04 0.28 �0.24 0.32 0.24 BoundaryCCR3+2P 34 0.06 0.29 �0.23 0.35 0.23 Boundary

BCCR1 50 0.24 0.10 0.14 0.35 0.14 BoundaryCCR2 65 0.20 0.06 0.14 0.26 0.14 BoundaryCCR3 50 0.24 0.10 0.14 0.34 0.14 BoundaryCCR4 52 0.13 0.09 0.04 0.22 0.04 GlobuleCCR5 51 0.17 0.09 0.08 0.26 0.08 BoundaryCCR6 53 0.19 0 0.19 0.19 0.19 GlobuleCCR7 47 0.19 0.11 0.08 0.30 0.08 BoundaryCCR8 51 0.19 0.09 0.10 0.28 0.10 BoundaryCCR9 48 0.17 0.08 0.09 0.25 0.09 BoundaryCCR10 49 0.22 0.06 0.16 0.28 0.16 BoundaryCX3C1 57 0.19 0.09 0.10 0.28 0.10 BoundaryCXCR1 42 0.19 0.02 0.17 0.21 0.17 GlobuleCXCR2 45 0.20 0.04 0.16 0.24 0.16 GlobuleCXCR3 47 0.17 0.08 0.09 0.25 0.09 BoundaryCXCR4 47 0.21 0.04 0.17 0.25 0.17 BoundaryCXCR5 47 0.13 0.06 0.07 0.19 0.07 GlobuleCXCR6 49 0.18 0.08 0.10 0.26 0.10 BoundaryCXCR7 43 0.18 0.02 0.16 0.20 0.16 GlobuleXCR1 43 0.19 0.02 0.17 0.21 0.17 Globule

R. Raucci et al. / Cytokine xxx (2014) xxx–xxx 5

sequence, as previously shown. The peculiarity of the C-terminalsequence is that the phosphorylation sites are in the most partlocated near the tail of the segments but not in the positivelycharged stretches. These two structural features (i.e., the locationof positively charged stretches and phosphorylation sites) are verywell retained in all the analyzed sequences of the human receptorsand, thus, this must be related to molecular mechanisms importantfor the biological function. Only in some cases we found the pre-dicted phosphorylation site(s) clearly embedded in a positivelycharged stretches (CCR2, CX3C1 and CXCR3) while only in the C-terminal of CXCR7 we can found a phosphorylation site locatedin a negatively charged stretch. However, the effect on the segmen-tal net charge is still retained. Like other transmembrane receptors,chemokine receptors may also be post-translationally modified bythe addition of sugar moieties either to the amide group of aspar-agine residues (N-glycosylation) or to hydroxyl groups of serine orthreonine residues (O-glycosylation) [44]. Experimental data onhuman chemokine receptor glycosylation are however scarce andonly four receptors have been shown to carry N-linked (CCR2,CXCR2 and CXCR4) or O-linked (CCR5) carbohydrate moieties intheir N-terminal [44], while there are no experimental evidencesfor the C-terminal (see Tables S2–S3). The exact role of N-terminal

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

domain glycosylation remains unclear [44]. Similarly to otherGPCRs, the glycosylation of the extracellular domains of chemokinereceptors has been proposed to increase their flexibility or todirectly participate in ligand binding [48]. Therefore, glycosylationof the receptor N-terminus is likely to be of greater importancethan initially appreciated and in particular cell-dependent glyco-sylation patterns may represent an additional level in the finelytuned regulation of the chemokine network [44].

4.4. Phylogenetic analysis of N- and C-terminal regions

In order to understand if the terminal sequence features of thehuman chemokine receptors have been conserved during theevolution, we analyzed sequences from different species. The per-centage of disordered residues reflects the same trend of thehuman sequences also for the majority of examined species, withonly few exceptions (Table 3). Moreover, the large presence of neg-atively and positively charged residues is maintained in at least80% of the organisms. We found that about 43% of the organismspresented an excess of positive residues while the remaining 57%a similar amount of positive and negative residues (Table 3). Wehave also compared the percentage of charged residues amongthe various organisms, subdivided in four different groups, i.e.mammals, birds, amphibians and fishes. We found that some spe-cific negatively charged sites in human N-terminal (correspondingto about 33% of the total sites) are conserved among mammalsmaintaining the same position in CCR4, CCR5, CCR7 and CXCR7(Table S2 Part A). This suggests a strategic role for the negativecharges in those specific positions of sequence. Even for positivelycharged residues we found a high percentage of conserved sites inC-terminals of mammal receptors ranging from 56% of CCR1, to100% of CXCR4, CXCR5, CXCR6, CXCR7, CCR4 and CCR9 (Table S2Part B). Furthermore, also the evaluation of the local flexibility ofthe terminal segments of the receptors in different organismsshows a great similarity with that found in human receptors(Fig. S2). In addition, assessing in all the species the conservationof phosphorylation, sulfation and N-/O-glycosylation sites wenoticed that for phosphorylation sites in the N-terminals of humanreceptors previously detected by Netphos and Disphos there isdiversified situation (Table 4 Part A). In fact, for some receptors(CXCR1, CXCR2, CXCR5, CXCR6, CCR3, CCR6 and CCR8) there is noconservation, while for others it ranges from a percentage of 38%(CCR9) up to 100% (CXCR4, CXCR7, CCR4, CCR5, and CCR10).Instead, the C-term (Table 4 Part B) exhibits percentages of conser-vations exceeding 50% (CCR2), with 13 receptors that have 100% ofconserved sites. The only exception is XCR1 which has no phos-phorylation site conserved. A further analysis, carried out on allthe N-terminal extracellular segments by the GPS software andNetphos, has shown the presence of putative phosphorylation sitesfor extracellular kinases (ectokinases). In fact, from the consensusof the results of the above mentioned softwares, and taking intoconsideration only evolutionarily conserved sites, we found thatthe casein kinase 2 and the protein kinase C may phosphorylatedifferent N-terms like THR14 of CCR1 and SER15 of CCR10, orTHR4 of CCR2 (Table 5). These sites, as shown in the table, arealways linked to disordered regions.

The sulfation sites have been found very well preserved in theN-terminal receptors of mammals with the exception of CXCR1,CCR9 and XCR1. It is interesting to note the conservation of at least50% of sulfation sites for 15 receptors. This finding confirms theimportance of this type of modification for the N-terminal regionsin the recognition and binding of extracellular ligands. Instead forthe C-terminals only three receptors have putative sites of sulfa-tion: CXCR7, CCR8 and XCR1 (100% of conservation).

At last, we observed that N-glycosylation is a modification thatcharacterizes almost exclusively the N-terminal of 13 chemokine

eatures of N- and C-terminal segments of the human chemokine receptors.

Fig. 1. Disorder prediction of human receptors terminal region Percentage of predicted disordered residues evaluated in the N-terminal (black bar) and C-terminal (grey bar)regions of 19 human chemokine receptors by a consensus procedure implemented in GensilicoMetaDisorder Service.

Fig. 2. Charge-hydrophobicity graph of human receptors terminal region Charge-hydrophobicity graph for N-terminal (A) and C-terminal (B) regions in 19 humanchemokine receptors. The black line represents the border between ‘‘nativelydisordered’’ and ‘‘natively ordered’’ proteins calculated according to Uversky et al.[32]. In B the ordered receptors are CCR9, CXCR7 and XCR1.

Table 3Percentage of sequences, from different organisms, which retain specific character-istics such as unordered propensity and presence of negative and positive charge.

Organism Nterm (%) Cterm (%)

>Unordered >Charge� >Charge+ >Unordered

CXCR1 10 30 100 100 10CXCR2 9 78 100 100 100CXCR3 10 90 100 80 90CXCR4 42 100 100 100 95CXCR5 7 14 100 100 100CXCR6 10 100 100 100 100CXCR7 7 0 100 100 100CCR1 9 67 100 100 89CCR2 7 100 86 100 86CCR3 15 20 100 100 80CCR4 7 71 100 43 29CCR5 61 93 100 100 98CCR6 12 33 100 100 75CCR7 9 0 100 100 100CCR8 8 25 100 100 100CCR9 9 33 100 100 100CCR10 5 80 100 100 100XCR1 5 40 100 100 80CX3CR1 8 50 100 100 75

6 R. Raucci et al. / Cytokine xxx (2014) xxx–xxx

receptors out of 19; in fact, the only predictions obtained for theC-terminals were for CXCR1 and CCR6 with a conservation of 0%and 50%, respectively. The sites found at the N-terminal are pre-served at 100% for CXCR3, CXCR4, CXCR6, CXCR7, CCR4 andCCR7, while only for the CXCR1 we found a conservation of 50%.All the receptors found N-glycosylated in humans have no sitesconserved among mammals. O-glycosylation sites were predictedfor 17 N-terminals out of 19 and for 13 C-terminals out of 19. Their

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

average of conservation among mammals is lower in the case ofthe N-terminal (ranging between 20% and 100% of XCR1 andCXCR3), while for the C-terminal O-glycosylation site appear tobe much more conserved, like phosphorylation sites.

5. Discussion

In this work we have systematically compared the sequenceproperties of the N- and C-terminal regions of 19 known humanchemokine receptors. What emerges from our analysis is that thesesegments are characterized by the concomitant presence of intrin-sic disorder, local flexibility and charged stretches. Since in PDBdatabase there are not experimental structures of these receptorscontaining the complete sequences for N- and C-terminal regions,we cannot compare in details our data with the structures becausethey are influenced from the full sequence. In fact, for example inthe case of CXCR1, its NMR structure presents only 11 residues(29–39) for N-terminal region and only 16 residues (309–324)for C-terminal region.

eatures of N- and C-terminal segments of the human chemokine receptors.

Table 4We report the number of phosphorylation (P), sulfation (S), N-glycosylation (N-glyc) and O-glycosylation (O-glyc) sites present in N- and C-terminal regions of human chemokine receptors in the first column and the number of humansites that are conserved among the different species.

HUMAN MAMMALIA BIRDS ANPHIBIA FISH

P sites S sites N-glyc O-glyc P sites S sites N-glyc O-glyc P sites S sites N-glyc O-glyc P sites S sites N-glyc O-glyc P sites S sites N-glyc O-glyc

Part A - N-TERMCXCR1 1 1 2 2 1 1CXCR2 1 2 1 2 1CXCR3 2 2 2 3 1 2 2 3 1CXCR4 3 3 1 3 3 3 1 2 1 1 1 1 1 1CXCR5 1 2 1 0 1CXCR6 2 2 1 1 1 1CXCR7 1 1 3 0 1 1 3 1 1 2CCR1 5 2 1 6 4 2 1CCR2 2 2 1 7 1 1 4CCR3 1 2 0 8 1CCR4 1 4 2 0 1 4 2CCR5 1 4 0 4 1 3 3CCR6 1 3 2 2 1CCR7 2 2 1 2 1 2 1 1 1 1CCR8 1 4 0 5 2 2 2CCR9 8 3 1 7 3 2 1 1CCR10 2 2 0 4 2 2 3 1XCR1 2 2 0 5 1 1CX3CR1 2 1 0 1 1 1

Part B - C-TERMCXCR1 2 0 1 5 2 1CXCR2 3 0 0 6 3 4CXCR3 5 0 0 3 5 2CXCR4 9 0 0 1 9 1 9 1 9 1 8CXCR5 5 0 0 0 4 2CXCR6 1 0 0 0 1CXCR7 4 1 0 0 3 1 3 1CCR1 4 0 0 4 4 4CCR2 2 0 0 4 1 1CCR3 4 0 0 3 4 2 2 2CCR4 2 0 0 1 2 1CCR5 3 0 0 1 3 1CCR6 1 0 2 0 1 1 1 1 1CCR7 4 0 0 1 4 1 2 1CCR8 7 1 0 5 4 1 2 5 3CCR9 2 0 0 0 2 1CCR10 3 0 0 3 3 2XCR1 2 1 0 3 1 1CX3CR1 4 0 0 0 3

R.R

aucciet

al./Cytokinexxx

(2014)xxx–

xxx7

Pleasecite

thisarticle

inpress

as:R

aucciRet

al.An

overviewof

thesequen

cefeatures

ofN

-and

C-termin

alsegments

ofthe

human

chemokine

receptors.Cytokin

e(2014),http://dx.doi.org/10.1016/j.cyto.2014.07.257

Table 5Kinase predictions obtained as consensus between Disphos, GPS and Netphosprograms concerning the evolutionary conserved residues at N- and C-terminalregions.

Phosphorylation sites by

consensus

Kinase from CONSENSUS GPS/

NETPHOSk

N-TERM

CCR1 10-Y

14-T CK2

18-Y SRC,EGFR

22-T

CCR2 4-T PKC

CCR3

CCR4

CCR5

CCR6

CCR7

CCR8

CCR9 19-S CK2

21-S

24-S

CCR10 15-S CK2

22-Y SRC

CX3CR1 14-Y EGFR

CXCR1 27-Y

CXCR2 10-S CK2

25-Y

CXCR3 29-Y

CXCR4 12-Y

21-Y

23-S

CXCR5

CXCR6 19-S CK2,DNAPK,ATM

CXCR7 8-Y

XCR1

C-TERM

CCR1 340-S

341-S RSK

343-S CDC2

345-S

CCR2 359-S

CCR3 340-S

341-S PKA,PKC,RSK

343-S

345-S

CCR4

CCR5 336-S

337-S PKA

342-S

CCR6 361-S RSK

CCR7 356-S PKG

364-S

365-S PKA,RSK

367-S

CCR8 345-S PKC

348-S

349-S

350-S RSK

353-Y InsR

CCR9 352-S PKC

368-S CK1

CCR10 347-S

348-S PKA,RSK

359-S CK1

CX3CR1 329-S PKC, DNAPK

332-S

336-S PKB,PKC,RSK

CXCR1 338-S PKC,RSK

CXCR2 347-S

352-S

353-S

CXCR3 351-S PKC

355-S PKA,RSK

356-S RSK,PKA

358-S

CXCR4 319-S PKC

324-S

Table 5 (continued)

Phosphorylation sites by

consensus

Kinase from CONSENSUS GPS/

NETPHOSk

325-S PKC

338-S

341-S

342-T CK2

344-S

346-S

347-S

348-S CK1

CXCR5 358-S

359-S PKA,RSK

361-S CK1,CK2

363-S

CXCR6

CXCR7 350-S

354-Y SRC

355-S CK2

XCR1

8 R. Raucci et al. / Cytokine xxx (2014) xxx–xxx

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

All the structural features evaluated in human sequences havebeen found evolutionarily maintained, thus it must have an impor-tant role in the chemokine receptors. This aspect has never beenpreviously reported for these receptors. In addition, their associa-tion with positively or negatively charged residues in C-terminalsand N-terminals, respectively, generates large mean electrostaticfields due to dispersion of discrete charges. Moreover, these regionsare also rich in post-translational modification sites, and their func-tional role is that of regulating the intensity of the local electrostaticfield to modulate recognition and/or binding. A growing number ofexamples provides evidence that the linearly charged stretches andmolecular recognition elements significantly overlap with post-translational modification sites falling in these motives [49]. How-ever, this does not signify that there is a quantitative correlationbetween the number of charges and that of the posttranslationalmodifications but that linearly charged stretches significantly over-laps with post-translational modification sites in some chemokinereceptors. Hence, the role of the charged stretches in interactionscan be considered as ‘‘polyelectrostatic’’ and need the presence ofpost-translational modifications as a way to change the net chargeand to modify the electrostatic field of a disordered region [50].However, overlapping functional elements are commonly foundin mammalian genes [51]. Very recently, it has been discovered inhuman coding exons that thousands of protein-coding genomicregions with very low synonymous mutation rates are believed thatperform overlapping functions [52]. These multi-functional generegions decode into intrinsically disordered segments depleted insecondary structure. This can be argued as a functional advantagebecause structural elements can better obey to increased functionaldemands if they have high structural flexibility [52]. The chemokinereceptors seems to fall in this area.

In the case of C-terminal segments numerous experimentalstudies have been performed on the presence of phosphorylations[36] whereas in the case of N-terminal segments only few experi-mental evidences were obtained about the presence of sites ofsulfation [44,53] and phosphorylation [47]. The sulfation occursenzymatically in the Golgi apparatus, within the cytoplasm, andsulphated proteins that arrive at the membrane level already carrythis post-translational modification. Therefore, we have excludedfrom phosphorylation sites analysis all the sulfation sites (experi-mentally or putatively determined) considering only the remainingresidues available for phosphorylation (Table S3). The results showthat the general framework of post-translational modificationsfound for the N-terminals is in relation with the presence of disor-der. Among other things, we have also found putative sites for

eatures of N- and C-terminal segments of the human chemokine receptors.

R. Raucci et al. / Cytokine xxx (2014) xxx–xxx 9

extracellular kinases and, among these, particularly those for CK2,a kinase that is well known to be functionally very active in theextracellular environment [54]. The analysis, although putativebut with a high consensus, shows a great specificity of the CK2sites for disordered segments (see Table S3). The apparent prefer-ence of these kinases for the disorder is surprising and we thinkthat this high specificity certainly reflects structural/functionalaspects which at present are not yet clear. However, at least inthe case of the N-terminal sequences, the presence of phosphateincreases locally the negative charge as well as the net charge ofthese regions.

It is important to underline that the presence of disorderedsegments is evolutionarily and consistently maintained in theterminal segments of these receptors and this strongly supportsthe idea that they must be involved into important physiologicalfunction even if at present no reliable structural mechanism canbe given. In fact, also for GPCRs the intrinsic disorder has oftenbeen associated with functional actions such as molecular recogni-tion but we still cannot get their molecular mechanism in detailsmainly because the terminal segments circumvent the NMR andX-ray analysis [55–57]. Therefore, the molecular mechanism is stillunknown and we know only limited and spread details. What iscertain is that it is a highly dynamic mechanism, since the molec-ular N- and C-terminal arms of receptors must explore the externalspace in search of ligands, recognize them, interact specifically andconvey them into the pocket that exists within the helical bundles[44,58]. Hence, all these actions require major structural mobility.Until now, there have been many functional studies even in otherbiological fields involving IDPs, and helpful in suggesting structuralhypotheses associated with their presence, such as the molecularrecognition. But the picture that comes out from this study leadsto think that the IDPs cannot be treated with an approach derivedfrom globular proteins but they are to be intended as chargedorganic polymers or poly/polycations and treated with the lawsof polymer physics [59,60]. In this context, the net charge pos-sessed by these proteins assumes a larger importance than previ-ously thought, particularly in determining their conformationand shape [39,61], where the fundamental interactions shall bethose dictated by backbone-backbone H-bonds [62], and, what israther unusual for globular proteins, the water is often a poor sol-vent for IDPs and for this reason they collapse as globules, althoughthey are polar and charged. Furthermore, thermodynamically theydo not tend to a single organized native structure as globular pro-teins do, but rather are represented as an ensemble of conformersat equilibrium with a wide distribution of different molecularshapes [39,62]. The conformation of the IDP ensembles is influ-enced by the distribution of charges on the protein [15,39]. Thefraction of charged residues discriminates the type of polyampho-lyte (strong or weak) and the conformations that are accessed aredetermined both by the values of the fractions of charged residues,both the distributions of linear sequences of oppositely chargedresidues [15,39]. In details, some authors were able to build a dia-gram of states for IDPs by using parameters related to the charges[39] where the different regions represent a continuum of possibil-ities between the type of conformations expected to be sampled byIDP. In this case, the word globule does not mean native globularstructure with some secondary structure topologically fixed intime but rather a collapsed globule with an ensemble of conforma-tions that fail to fold autonomously into singular structures.

To support the view that the chemokine terminals have to beconsidered as polyelectrolytes where conformational ensemblesand structural transitions between them are modulated by chargefraction variations due to post-translational modifications, we haveanalyzed the charge distribution of the human receptor terminals.A general view is reported in Table 2 A and B. As one can see, themajority of the segments are located in the borderline region 2 of

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

the state diagram [39] between the region 1, characterized bythe presence of globules and the region 3 (coils). It is interestingto note how a simulation of post-translational modifications fortwo N-terminals increases FCR and NCPR, shifting CXCR3 andCCR3 from globule to boundary (Table 2A). We think that theborderline structures (or boundary) are likely those with highertendency to have a structural transition as consequence of differ-ent post-translational modifications. In conclusion, we can say thatthe conformational properties of the terminal segments of humanchemokine receptors can be predicted on the basis of the distribu-tion of their net charge as polycations or polyanions. Therefore, thestructural properties we experimentally measure when IDPs are insolution represent the average value of this statistical distribution.

However, what clearly emerges is that the presence of disorderin the chemokine receptors is involved in the control of theirmolecular behavior in solution. It should induce a certain segmen-tal conformational plasticity probably required to allow access tothe enzymes that induce post-translational modifications. Onlywith a proper conformational adaptability it is possible to obtainnot only the changes but also the formation of different post-translational isomers, as for example the phospho-isomers, tofacilitate multiple recognitions [7]. This may also be a reasonableexplanation of why we found so many putative phosphorylationsites on N-terminals, otherwise it becomes difficult to explain thepresence of so many useless post-translational sites.

In conclusion from the comparison of the sequences of 19human chemokine receptors emerges that the terminal segmentsof these proteins are characterized by the simultaneous presenceof different characteristics which reasonably seem to cooperatefor controlling their conformations. We have shown here thatconsidering a terminal segment of the human chemokine receptorsas strong polyampholytes helps to rationalize existing structural(intrinsic disorder, charged stretches, local flexibility, post-transla-tional sites) and functional (post-translational event) characteris-tics as aspects of a unique phenomenon devoted to the molecularrecognition. In particular, the high intrinsic disorder propensity,the distribution of net charge and the post-translational modifica-tion sites were conserved, especially among mammals, suggestingtheir functional importance in modulating the interaction of recep-tors with their ligands. This suggests that is necessary to study themolecular mechanism of ligand recognition and binding to theextracellular region of the chemokine receptors by taking intoaccount the role of the intrinsically disordered regions coupled totheir conformational behavior in solution as polyelectrolytes.Further studies on N-terminals synthetically prepared, and usedas model compounds for these receptors, are in progress in ourlab to experimentally elucidate their physical structure and howthey recognize and bind the chemokines. Preliminary data confirmtheir behavior in solution as polycations.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.cyto.2014.07.257.

References

[1] Olson TS, Ley K. Chemokines and chemokine receptors in leukocyte trafficking.Am J Physiol Regul Integr Comp Physiol 2002;283:7–28.

[2] Clark-Lewis I, Kim KS, Rajarathnam K, Gong JH, Dewald B, Moser B, et al.Structure-activity relationships of chemokines. J Leukoc Biol 1995;57:703–11.

[3] Charo IF, Ransohoff RM. The many roles of chemokines and chemokinereceptors in inflammation. MD N Engl J Med 2006;354:610–21.

[4] He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsicdisorder in proteins: an overview. Cell Res 2009;19:929–49.

[5] Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing theprotein structure-function paradigm. J Mol Biol 1999;293:321–31.

eatures of N- and C-terminal segments of the human chemokine receptors.

10 R. Raucci et al. / Cytokine xxx (2014) xxx–xxx

[6] Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. Intrinsicdisorder in cell-signaling and cancer-associated proteins. J Mol Biol2002;323:573–84.

[7] Costantini S, Sharma A, Raucci R, Costantini M, Autiero I, Colonna G. Genealogyof an ancient protein family: the Sirtuins, a family of disordered members.BMC Evol Biol 2013;13:60.

[8] Myung JK, Banuelos AC, Fernandez JG, Mawji NR, Wang J, Tien AH, et al. Anandrogen receptor N-terminal domain antagonist for treating prostate cancer. JClin Invest 2013;123:2948–60.

[9] Hansen JP, McDonald IR. Theory of simple liquid. Amsterdam, TheNetherlands: Academic Press; 2006.

[10] Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z,et al. The importance of intrinsic disorder for protein phosphorylation. NucleicAcids Res 2004;32:1037–49.

[11] Sigalov AB, Uversky VN. Differential occurrence of protein intrinsic disorder inthe cytoplasmic signaling domains of cell receptors. Self Nonself2011;2:55–72.

[12] Das RK, Mittal A, Pappu RV. How is functional specificity achieved throughdisordered regions of proteins. BioEssays 2012;35:17–22.

[13] Pappu RV, Wang X, Vitalis A, Crick SL. A polymer physics perspective ondriving forces and mechanisms for protein aggregation. Arch Biochem Biophys2008;469:132–41.

[14] Dobrynin AV. Theory and simulations of charged polymers: from solutionproperties to polymeric nanomaterials. Curr Opin Colloid Interface Sci2008;13:376–88.

[15] Das RK, Pappu RV. Conformations of intrinsically disordered proteins areinfluenced by linear sequence distributions of oppositely charged residues.PNAS 2013;110:13392–7.

[16] Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Freer ST, Rose PW. Simulatingdisorder–order transitions in molecular recognition of unstructured proteins:where folding meets binding. PNAS 2003;100:5148–53.

[17] Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, et al.Protein identification and analysis tools on the ExPASy server. Totowa,NJ: Humana Press; 2005.

[18] Cole C, Barber JD, Barton GJ. The Jpred 3 secondary structure prediction server.Nucleic Acids Res 2008;35:197–201.

[19] Ragone R, Facchiano F, Facchiano A, Facchiano AM, Colonna G. Flexibility plotof proteins. Protein Eng 1989;2:497–504.

[20] Vihinen M, Torkkila E, Riikonen P. Accuracy of protein flexibility predictions.Proteins 1994;19:141–9.

[21] Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB. Protein disorderprediction: implications for structural proteomics. Structure 2003;11:1453–9.

[22] Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. Length-dependentprediction of protein intrinsic disorder. BMC Bioinformatics 2006;7:208.

[23] Su CT, Chen CY, Hsu CM. IPDA: integrated protein disorder analyser. NucleicAcids Res 2007;35:465–72.

[24] Cheng J, Sweredoski M, Baldi P. Accurate prediction of protein disorderedregions by mining protein structure data. Data Min Knowl Disc 2005;11:213–22.

[25] Linding R, Russell RB, Neduva V, Gibson TJ. GlobPlot: exploring proteinsequences for globularity and disorder. Nucleic Acid Res 2003;31:3701–8.

[26] Dosztányi Z, Csizmók V, Tompa P, Simon I. The pairwise energy contentestimated from amino acid composition discriminates between folded andintrinsically unstructured proteins. J Mol Biol 2005;347:827–39.

[27] Shimizu K, Hirose S, Noguchi T. POODLE-S: web application for predictingprotein disorder by using physicochemical features and reduced amino acidset of a position-specific scoring matrix. Bioinformatics 2007;23:2337–8.

[28] Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T. POODLE-L: a two-level SVMprediction system for reliably predicting long disordered regions.Bioinformatics 2007;23:2046–53.

[29] Ishida T, Kinoshita K. PrDOS: prediction of disordered protein regions fromamino acid sequence. Nucleic Acids Res 2007;35:460–4.

[30] Vullo A, Bortolami O, Pollastri G, Tosatto S. Spritz: a server for the prediction ofintrinsically disordered regions in protein sequences using kernel machines.Nucleic Acids Res 2006;34:164–8.

[31] Yang ZR, Thomson R, McNeil P, Esnouf RM. RONN: the bio-basis functionneural network technique applied to the detection of natively unorderedregions in proteins. Bioinformatics 2005;21:3369–76.

[32] Uversky VN, Gillespie JR, Fink AL. Why are ‘natively unfolded’ proteinsunstructured under physiologic conditions? Proteins 2000;41:415–27.

[33] Kyte J, Doolittle RF. A simple method for displaying the hydropathic characterof a protein. J Mol Biol 1982;157:105–32.

[34] Blom N, Gammeltoft S, Brunak S. Sequence- and structure-based prediction ofeukaryotic protein phosphorylation sites. J Mol Biol 1999;294:1351–62.

[35] Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 20 a tool to predict kinase-specificphosphorylation sites in hierarchy. Mol Cell Proteomics 2008;7:1598–608.

Please cite this article in press as: Raucci R et al. An overview of the sequence fCytokine (2014), http://dx.doi.org/10.1016/j.cyto.2014.07.257

[36] Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction ofpost-translational glycosylation and phosphorylation of proteins from theamino acid sequence. Proteomics 2004;4:1633–49.

[37] Monigatti F, Gasteiger E, Bairoch A, Jung E. The Sulfinator: predicting tyrosinesulfation sites in protein sequences. Bioinformatics 2002;18:769–70.

[38] Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB,Schjoldager KT, et al. Precision mapping of the human O-GalNAcglycoproteome through SimpleCell technology. EMBO J 2013;32:1478–88.

[39] Mao AH, Crick SL, Vitalis A, Chicoine CL, Pappu RV. Net charge per residuemodulates conformational ensembles of intrinsically disordered proteins.PNAS 2010;107:8183–8.

[40] Szilagyi A, Gyorffy D, Zavodszky P. The twilight zone between protein orderand disorder. Biophys J 2008;95:1612–26.

[41] Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK. Sequencecomplexity of disordered protein. Proteins 2001;42:38–48.

[42] Tompa P. Intrinsically unstructured proteins. Trend Biochem Sci2002;27:527–33.

[43] Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. Intrinsic proteindisorder in complete genomes. Genome Inform Ser Workshop Genome Inform2000;11:161–71.

[44] Szpakowska M, Fievez V, Arumugan K, van Nuland N, Schmit JC, Chevigné A.Function, diversity and therapeutic potential of the N-terminal domain ofhuman chemokine receptors. Biochem Pharmacol 2012;84:1366–80.

[45] Cohen P. Protein phosphorylation and hormone action. Proc R SocLond BBiolSci 1988;234:115–44.

[46] Busillo JM, Armando S, Sengupta R, Meucci O, Bouvier M, Benovic JL. Site-specific phosphorylation of CXCR4 is dynamically regulated by multiplekinases and results in differential modulation of CXCR4 signaling. J Biol Chem2010;285:7805–17.

[47] Schumacher JA, Crockett DK, Elenitoba-Johnson KS, Lim MS. Evaluation ofenrichment techniques for mass spectrometry: identification of tyrosinephosphoproteins in cancer cells. J Mol Diagn 2007;9:169–77.

[48] Hartmann J, Tran TV, Kaudeer J, Oberle K, Herrmann J, Quagliano I, et al. Thestalk domain and the glycosylation status of the activating natural killer cellreceptor NKp30 are important for ligand binding. J Biol Chem2012;287:31527–39.

[49] Tompa P, Fuxreiter M, Oldfield CJ, Simon I, Dunker AK, Uversky VN. Closeencounters of the third kind: disordered domains and the interactions ofproteins. BioEssays 2009;31:328–35.

[50] Lobanov MY, Furletova EI, Bogatyreva NS, Roytberg MA, Galzitskaya OV.Library of disordered patterns in 3D protein structures. PLoSComputBiol2010;6.

[51] Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locatingprotein-coding sequences under selection for additional, overlappingfunctions in 29 mammalian genomes. Genome Res 2011;21:1916–28.

[52] Macossay-Castillo M, Kosol S, Tompa P, Pancsa R. Synonymous constraintelements show a tendency to encode intrinsically disordered proteinsegments. PLoS Comput Biol 2014;10:e1003607.

[53] Ludeman JP, Stone MJ. The structural role of receptor tyrosine sulfation inchemokine recognition. Br J Pharmacol 2014;171:1167–79.

[54] Siepmann M, Kumar S, Mayer G, Walter J. Casein kinase 2 dependentphosphorylation of neprilysin regulates receptor tyrosine kinase signaling toAkt. PLoS ONE 2010;5:e13134.

[55] Wu B, Chien EY, Mol CD, Fenalti G, Liu W, Katritch V, et al. Structures of theCXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists.Science 2010;330:1066–71.

[56] Serebryany E, Zhu GA, Yan EC. Artificial membrane-like environments forin vitro studies of purified G-protein coupled receptors. Biochim Biophys Acta-Biomembr 2012;1818:225–33.

[57] Park SH, Das BB, Casagrande F, Tian Y, Nothnagel HJ, Chu M, et al. Structure ofthe chemokine receptor CXCR1 in phospholipid bilayers. Nature 2012;491:779–83.

[58] De Clercq E. Recent advances on the use of the CXCR4 antagonist plerixafor(AMD3100, Mozobil) and potential of other CXCR4 antagonists as stem cellmobilizers. Pharmacol Ther 2010;128:509–18.

[59] Flory PJ. Principles of polymer chemistry. Ithaca, NY and London, UK: CornellUniversity Press; 1953.

[60] Grosberg AY, Khokhlov AR. Statistical physics of macromolecules. NewYork: AIP Press; 1994.

[61] Meng W, Lyle N, Luan B, Raleigh D, Pappu RV. Experiments and simulationsshow how long-range contacts can form in expanded unfolded proteins withnegligible secondary structure. PNAS 2013;110:2123–8.

[62] Tran HT, Mao A, Pappu RV. Role of backbone-solvent interactions indetermining conformational equilibria of intrinsically disordered proteins. JAm Chem Soc 2008;130:7380–92.

eatures of N- and C-terminal segments of the human chemokine receptors.