Cellulose-Builder: A toolkit for building crystalline structures of cellulose
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Cellulose-Builder: A toolkit for building crystalline structures of cellulose
Cellulose-Builder: A Toolkit for Building CrystallineStructures of Cellulose
Thiago C. F. Gomes[a] and Munir S. Skaf*[a]
Cellulose-builder is a user-friendly program that builds
crystalline structures of cellulose of different sizes and
geometries. The program generates Cartesian coordinates
for all atoms of the specified structure in the Protein Data
Bank format, suitable for using as starting configurations in
molecular dynamics simulations and other calculations.
Crystalline structures of cellulose polymorphs Ia, Ib, II, andIIII of practically any size are readily constructed which
includes parallelepipeds, plant cell wall cellulose elementary
fibrils of any length, and monolayers. Periodic boundary
conditions along the crystallographic directions are easily
imposed. The program also generates atom connectivity file
in PSF format, required by well-known simulation packages
such as NAMD, CHARMM, and others. Cellulose-builder is
based on the Bash programming language and should run
on practically any Unix-like platform, demands very modest
hardware, and is freely available for download from
ftp://ftp.iqm.unicamp.br/pub/cellulose-builder. VC 2012 Wiley
Periodicals, Inc.
DOI: 10.1002/jcc.22959
Introduction
Cellulose has recently attracted a great deal of attention due
to its potential to become a carbon-neutral feedstock for
renewable biofuels and chemicals. As a major component of
vegetable biomass, cellulose is the most abundant organic
compound on Earth’s biosphere. Many efforts have been
devoted to comprehend the structure and properties of cellu-
lose itself,[1–8] and to understand the microscopic nature of
plant cell wall architecture and the molecular aspects associ-
ated with its structural strength.[9]
One of the major challenges in the development of a sus-
tainable means of obtaining biofuels and other valuable chem-
icals from lignocellulosic biomass is the recalcitrance of the
cellulose to the action of degrading enzymes and chemi-
cals.[10] The deconstruction of lignocellulose biopolymers into
fermentable sugars by means of enzymatic saccharification is
the most economically costly and scientifically challenging
step of the currently available process for biochemical conver-
sion of biomass into liquid fuels. Therefore, it is of fundamen-
tal importance to gain further understanding of the mecha-
nisms by which enzymes and auxiliary proteins recognize,
bind to, and disrupt crystalline cellulose for subsequent cleav-
age of the glycosidic bonds of the polysaccharide chains. To
this aim, several molecular dynamics (MD) computer simula-
tions have been reported recently which utilize crystalline cel-
lulose tridimensional structures or surfaces as model sub-
strates, in addition to the proteins of interest.[7,11–14] Very
recently, MD simulations of the interactions between cellulose
and ionic liquids have also been reported[15] in attempt to
deepen our molecular level understanding of how ionic liquids
dissolve crystalline cellulose.[16,17]
These simulation studies share in common the need for the
atomic coordinates of the cellulose crystal structures as initial
configurations, which were independently constructed accord-
ing to the structure of the desired substrate. As the scientific
activity in this area is rapidly increasing, it would be very use-
ful to theoretical and computational chemists and physicists
alike to be able to readily construct crystalline structures of
cellulose of different sizes and shapes. In this work, we present
cellulose-builder, an automated solution for generating atomic
coordinate files in the Protein Data Bank (PDB) format that can
be readily used as input to simulate systems containing struc-
tures of crystalline cellulose of practically any size, shape, and
dimension. The code is freely available for download at ftp://
ftp.iqm.unicamp.br/pub/cellulose-builder.
Cellulose-builder is written as a Bash script and relies on sev-
eral well-established tools available on most Unix-like opera-
tional system and on the VMD package[18] to provide an auto-
mated, straightforward, and user friendly means of generating
cellulose crystals of different shapes and sizes. Cellulose-builder
only requires users to enter three integers (i, j, k) correspond-
ing to the number of cellulose unit cells to be replicated in
each crystallographic direction (a, b, c). The script will then
perform all operations needed to build a crystal of the chosen
size and will produce the initial configuration file in PDB for-
mat. In addition, cellulose-builder will also output the corre-
sponding atom connectivity (topology) information as a PSF
file, which contains molecule-specific information required by
some of the most popular MD simulation packages, including
NAMD[19] and CHARMM,[20] and can be easily converted into
the AMBER[21] file type prmtop.
[a] T. C. F. Gomes, M. S. Skaf
Institute of Chemistry, State University of Campinas – UNICAMP, Cx. P. 6154,
Campinas, SP 13083-970, Brazil
E-mail: [email protected]
Contract/grant sponsor: Fapesp; contract/grant number: 08/56255-9;
Contract/grant sponsor: CNPq; contract/grant number: 140978/2009-7.
VC 2012 Wiley Periodicals, Inc.
1338 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
Given the recent interest in performing simulations contain-
ing model plant cell wall cellulose fibrils, which consist of 36
cellulose chains specifically arranged in space (Fig. 1),[9] the
software also enables one to build PDB and PSF files for ele-
mentary fibrils of arbitrary length. The cellulose structure files
created by cellulose-builder can be then combined with
coordinate files of other systems such as solvent and protein
molecules to construct solvated cellulose and cellulose-protein
complexes, using available packages for generating initial con-
figuration for molecular simulations, such as PACKMOL.[22] Ele-
mentary cellulose fibrils can also be readily combined with
each other and decorated with hemicellulose and lignin mole-
cules with PACKMOL to generate plant cell wall models of more
realistic architectures and other complex assemblies.
All tools upon which cellulose-builder relies to perform its
task are present by default on most Unix-like operating sys-
tems (OS) (e.g., sed, grep, nl, tr, cat, echo, bc, perl), or can be
readily obtained free of cost for Unix operating system (for
instance, octave, VMD, and psfgen).[18,19] Those are all well
established and trusted tools, most of them coded in the C
programming language. Nevertheless, good practices and
techniques are available in Bash programming as well,[23–25]
and many of them have been used to code cellulose-builder.
The inherently lower performance in terms of computer time
of Bash scripts in comparison to C programs, for instance, is
not an important issue in this case because generating initial
coordinates and connectivity information files by scripting will
still demand negligible amounts of computer time compared
to simulation time and analysis.
In ‘‘Workflow Overview’’ section, we present an overview of
the code’s workflow. In ‘‘Capabilities and Usage Examples’’ sec-
tion, we provide several usage examples. In ‘‘File Structure and
Variables’’ section, the file structure is described, and in ‘‘Bench-
mark’’ section, we present a summary of running times. Our con-
cluding remarks are presented in ‘‘Concluding Remarks’’ section.
Workflow Overview
In this section, we provide a
brief description of the proce-
dure adopted by cellulose-
builder. To build the PDB config-
uration files according to the
most recent crystallographic
structures of cellulose, we have
taken the data reported by Nish-
iyama et al.[1–4] for the asymmet-
ric units of cellulose Ia, Ib, II, andIIII allomorphs. The reported
structures were obtained from
synchrotron X-ray and neutron
diffraction experiments on both
hydrogenated and deuterated
cellulose fibers (except for cellu-
lose II, for which neutron diffrac-
tion experiment results are not
available[3]). This enabled precise
determination of the positions of
all hydrogen atoms and, thus, of the hydrogen bonding net-
work in different cellulose allomorphs. Cellulose Ib, II, and IIIIcrystal structures belong to the monoclinic P21 spatial group,
whereas cellulose Ia crystal structure belongs to the triclinic P1
space group. Cellulose-builder uses the crystallographic frac-
tional coordinates to calculate the Cartesian coordinates of the
hydrogen atoms in the final crystalline structure. That is, for
atoms with more than one reported crystallographic position
the program uses fractional coordinates for the position with
higher occupancy.
In Figure 2, we show a simplified scheme describing the
major steps taken by the program during its execution. Start-
ing from the crystallographic fractional coordinates, the P21space group symmetry operations are applied to the atoms in
the asymmetric unit to determine symmetry equivalent posi-
tions and generate fractional coordinates for the remainder
atoms within one unit cell.[28–30] For allomorph Ia, the previous
operation is not necessary since for its space group, P1, the
asymmetric unit coincides with the unit cell. The fractional
coordinates of the adjacent unit cells are then generated by
adding unity (one) n times to the original unit cell fractional
coordinates, in the appropriate manner, where n is an integer.
Fractional coordinates are then converted into Cartesian
coordinates[28–30] (see Supporting Information) using the cell
dimensions reported by Nishiyama et al.[1–4]). At the end of
this stage, a directory named crystal is created which will store
the initial configuration file corresponding to desired crystallite
both in XYZ and PDB formats (files crystal.xyz and crystal.pdb,
respectively), among other relevant files created during the
process.
A connectivity information file for the crystallite is also pro-
vided in PSF format (crystal.psf ). This file is suited for use with
NAMD[19] and CHARMM[20] packages. The topology currently
implemented is meant for use with the carbohydrate force
field by MacKerell and coworkers.[26,27] Topology files suitable
Figure 1. Model proposed by Ding and Himmel[9] for maize primary cell wall elementary fibril, as seen from
their nonreducing ends. This Ib elementary fibril possesses 36 cellulose chains. [Color figure can be viewed in
the online issue, which is available at wileyonlinelibrary.com.]
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1339
for other simulation packages
and force fields can be readily
obtained from the Cartesian
coordinates or PDB files output
by cellulose-builder.
Capabilities and UsageExamples
Basically, the program allows one
to build three distinct types of
crystals, which we herein denote:
parallelepipeds, (plant cell wall) el-
ementary fibrils, and monolayers.
Descriptions and instructions for
generating each crystal type are
provided below. Cellulose-builder
also allows for implementation of
periodic boundary conditions
(pbc) along any crystallographic
direction. Being a Bash program, it
must be run on a Bash shell prompt, which is the default shell on
manymodern Linux systems and easily available on other Unix-like
OS. Under Windows OS, cellulose-builder can be run within
Cygwin.*
We exemplify how to apply the capabilities implemented in
cellulose-builder using the Ib cellulose allomorph. Building
structures for other allomorphs readily follows. Pertinent
remarks are included when referring to allomorphs other than
Ib. The glucan chains in cellulose Ib containing the glucopyra-
nose units were labeled ‘‘origin’’ and ‘‘center’’ chains, referring
to their positions in the unit cell. The same is true for cellulose
II, in spite the fact that this allomorph’s origin and center
chains run antiparallel relative to each other. In cellulose Iaand IIII there is only one type of chain. These features reflect
in which options are available for each allomorph. To properly
set the allomorph type, one must edit the variable PHASE in
the text file input.inp, which contains only three lines (see ‘‘File
Structure and Variables’’ section, Files).
Parallelepiped crystals
We denote parallelepiped crystals those obtained by simple
replication of the cellulose unit cell. This crystal shape is avail-
able for all supported allomorphs. For building parallelepiped
crystal, the user must specify three integer numbers (i, j, k) as
arguments upon calling cellulose-builder on the command
line. We provide an example (regard ‘‘$’’ as the Bash prompt,
and our current working directory as the cellulose-builder par-
ent directory):
$ :=cellulose-builder 4 4 5 (example 1)
In this case, the program will set up a crystal 4 unit cells
wide in the a and b crystallographic directions and five unit
cells wide in the c crystallographic direction, which is parallel
to the cellulose chains. The resulting crystal, shown in Figure
3, is comprised of 25 cellulose chains, each one bearing five
cellobiose units.
Using the parallelepiped method one can build crystallites
exposing different proportions of hydrophobic and hydrophilic
surface areas. In Figure 4, we exemplify this possibility by
showing two cellulose Ib blade-shaped crystallites, exposing
mostly one type or the other of surfaces. For instance, to
obtain a Ib crystallite with exposed surface area predominantly
hydrophilic (Fig. 4A):
$ :=cellulose-bulider 10 2 5 (example 2)
Instead, for a Ib crystallite with predominantly hydrophobic
exposed surface area (Figure 4B):
$ :=cellulose-builder 2 10 5 (example 3)
Periodic boundary conditions on parallelepiped crystals
To periodically replicate a given crystallite, one must ensure
that the crystallite has perfectly fitting edges so that the sys-
tem exhibits translational symmetry. Cellulose-builder supports
parallelepiped crystallites possessing translational symmetry
with respect to the three axes shown in Figure 3. The user
may wish to implement pbc to one, two or all three crystallo-
graphic directions (a, b, c). This can be easily accomplished by
editing the variable PBC in the text file input.inp and run ./cel-
lulose-builder at the prompt, as shown by examples 1 to 3
above, for instance. Let us discuss how to implement pbc
along directions a and b separately from direction c.
Figure 2. Cellulose-builder simplified workflow. Given the experimentally determined space group (P21 for cel-
lulose Ib, II and IIII) and experimental fractional (atomic) coordinates, the program determines the symmetry
equivalent positions of all other atoms within the unit cell. The program then replicates unit cells according to
user input requirements, exploring the convenience of working in fractional coordinates for such task. After rep-
lication, fractional coordinates are transformed into Cartesian coordinates using experimental cell dimensions to
yield a file in XYZ format. A major editing is then performed to achieve the initial configuration file in PDB for-
mat, suited for common MD simulation packages. The program also writes a script for psfgen and executes it,
yielding a connectivity information file, in PSF format, meant to model the cellulose crystal using the CHARMM
force field for carbohydrates.[26,27]
*A Linux-like environment for Windows that ports software running on POSIXsystems (such as Linux, BSD, and Unix systems) onto Windows. (http://www.cygwin.com).
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1340 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
For pbc along the crystallographic a direction only one
must set PBC¼A. For instance, with PBC variable set to PBC¼A,
the following command:
$ :=cellulose-builder 4 5 5 (example 4)
yields the crystallite shown in Figure 5A.
For obtaining translational symmetry along the crystallo-
graphic b direction, one must set PBC¼B in file input.inp. With
PBC¼B in file input.inp, the command
$ :=cellulose-builder 5 4 5 (example 5)
produces the crystallite shown in Figure 5B.
To obtain a structure periodically replicated along both crys-
tallographic a and b directions, the input.inp file must be
edited to set PBC¼ALL. The crystallite shown in Figure 5C was
obtained from the command below with PBC¼ALL:
$ :=cellulose-builder 4 4 5 (example 6)
Figure 3. Cellulose Ib crystallite generated by cellulose-builder as seen from its nonreducing ends (left) and rotated by 90� (right). For the sake of consis-
tency with notation used by other authors,[7] we have adopted the same viewpoint as those authors for showing the cellulose crystallite. Crystallographic
faces are indicated by their corresponding Miller indices. Origin and center chain layers, and unit cell axes are indicated as well. Cellulose chains are parallel
to the c direction.
Figure 4. Different surfaces exposed by two different cellulose Ib blade-shaped crystallites. Left: predominantly hydrophilic (010) surfaces are exposed.
Right: predominantly hydrophobic (100) surfaces are exposed. Top and bottom images represent the same crystallite seen from different viewpoints on
VMD’s X-window OpenGL display.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1341
Imposing pbc along crystallographic directions a or b only
makes sense to allomorphs Ib and II, since for Ia and IIII, the
unit cell is such that crystallites automatically have transla-
tional symmetry in all crystallographic directions.
Regarding the crystallographic c direction, no special action is
needed to endow crystallites with translational symmetry along
that direction, since any replication of cellulose unit cell yields a
crystallite already possessing that property for any allomorph.
As a consequence, all cellulose crystallites generated by cellu-
lose-builder will automatically possess translational symmetry in
the crystallographic c direction. Indeed, the default PBC value in
input.inp file is PBC¼NONE. The crystallites shown in Figures 3
and 4 were built with PBC¼NONE. Setting the Bash variable
PBC to NONE does not mean that the resulting crystallite will
have no translational symmetry, but instead that no subsequent
procedure is necessary to confer further translational symmetry
to the crystallite after its construction. However, with this
option, a hydrogen atom (H) and a hydroxyl group (OH) will be
respectively added to the opposing end points of every chain
ensuring there are no dangling bonds.†
Very often in computer simulations of crystalline cellulose,
one would like to work with truly infinitely periodic systems
along the c direction, which requires a perfectly matching
bond between reducing and nonreducing ends of replicated
chains along the c direction.‡ To control periodic covalent
bonding in the final connectivity information file (crystal.psf )
delivered by cellulose-builder one must edit the third line of
the file input.inp, which reads:
PCB c ¼ FALSE
Setting the Bash variable PCB_c to FALSE in the input.inp
file causes no periodic covalent bonding along c direction, and
is the default. To include periodic covalent bonding in the final
connectivity file one must set:
PCB c ¼ TRUE
In addition to parallelepiped crystallites, this option can be
also applied to the other two crystal types provided by cellu-
lose-builder, i.e. elementary fibrils and monolayers, as
described next.
Elementary fibrils
Cellulose-builder can also build cellulose elementary fibrils of any
length from allomorphs Ia, Ib, and II. Cellulose elementary fibrils
from allomorph Ib possess the cross-section depicted in Figure
1. The primary fibril with such a cross section was constructed
by carving out a larger Ib parallelepiped crystallite (see Support-
ing Information). This disposition of chains corresponds to a
recently proposed model for the elementary cellulose fibrils of
maize cell wall, free of hemicelluloses, lignins, and pectins.[9] The
model is likely to be applicable to several other species of plants
since the terminal enzymatic complex that synthesizes cellulose
elementary fibrils at maize cell membranes is similar to that of
other angiosperms.[31] Depending on the source tissue and orga-
nism, cellulose chains in the elementary fibrils may have from a
few tens to several hundreds of cellobiose residues.
To build an elementary fibril, the string fibril must be passed
as first argument in the command line, whereas the number
of cellobiose units in the chains that compose the elementary
fibril (i.e. the degree of polymerization) is specified by an inte-
ger k as second argument:
$ :=cellulose-builder fibril k (example 7)
Cellulose Ib elementary fibrils of several lengths are shown
in Figure 6, for k ¼ 5, 25, 50, 100, and 500. If one wishes to
impose pbc along the chains direction, periodic covalent
bonding can be implemented by setting PBC_C¼TRUE, as
discussed above. In the case of elementary fibrils, pbc are
supported in the c direction only. Fibrils with arbitrary cross
sections, different from the maize cell wall cellulose elemen-
tary fibril shown in Figure 1, can also be readily constructed
(see Supporting Information). Elementary fibrils can be fur-
ther arranged to assemble complex hyperstructures as mod-
els for plant cell walls or simply solvated by molecular sol-
vents (Supporting Information) using software such as
PACKMOL.[22]
Monolayers
Cellulose Ib crystal structure consists of alternating layers of or-
igin and center cellulose chains, with no hydrogen bonds
between layers.[1] Recent experimental studies have shown cel-
lulose elementary fibrils from woody material to undergo
delamination (or peeling) along its (200) plane after (2,2,6,6-
tetramethylpiperidin-1-yl)oxyl-mediated oxidation and intensive
sonication.[32] Those finding motivated us to include an option
for generating monolayers. So, the command:
$ :=cellulose-builder origin j k (example 8)
will build a monolayer composed of j cellulose origin chains
containing k cellobiose residues each, whereas,
$ :=cellulose-builder center j k (example 9)
yields a similar monolayer composed of center chains. Exam-
ples 9 and 10 are also valid for obtaining monolayers com-
posed of origin or center chains from allomorph II. Since allo-
morph Ia has only one type of chain, the equivalent command
line for obtaining a monolayer of chains from cellulose Ia is
$ :=cellulose-builder monolayer j k (example 10)
†Indeed, since for any allomorph the unit cell is composed of anhydrogluco-pyranose units, any replication of cellulose unit cell yields a crystallite possess-ing translational symmetry along the c direction. Nevertheless, in the finalsteps, missing atoms at the extremities of cellulose chains are added: onehydrogen atom is added at one terminus, one OH group at the other terminusof each cellulose chain. These are the only atoms in the whole crystallitewhose coordinates are guessed for (except for allomorph II whose hydrogensHO2, HO3, and HO6 positions have not been determined experimentally[3]
due to lack of neutron diffraction data and so have to be guessed). Thus, trans-lational symmetry along the c direction is actually conditioned to the elimina-tion of those added atoms (one water molecule per cellulose chain).‡Covalent bonding between atoms of different chains is usually set up in theinput files (topology) of molecular simulation program suites.
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1342 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
Cellulose IIII has only one type of chain as well, but
there is more than one manner of producing monolayers
from its structure. Therefore, monolayers are not automati-
cally supported for allomorph IIII. Nevertheless, one can
always build crystallites of arbitrary shapes for any of the
allomorphs by a simple method provided in Supporting
Information.
Similarly, periodic boundary conditions can be implemented
for monolayers along the chains direction via periodic covalent
bonding by setting PBC_C¼TRUE in file input.inp.
Figure 5. Cellulose Ib crystallites suited for pbc in a (left), b (middle) and both a and b (right) crystallographic directions. Origin and center chain layers
are indicated, as well as crystallographic directions and Miller indices. The notation adopted for Miller indices is the same adopted by Matthews et al.,[7] so
faces where center chains are in the surface are indicated by (200) or (020) to reflect their positions half-way the unit cell. Colored lines within crystallites
indicate the crystallographic directions along which the crystallite is endowed with translational symmetry.
Figure 6. Cellulose Ib elementary fibrils of different lengths, possessing k ¼ 5, 25, 50, 100, and 500 cellobiose units, generated with cellulose-builder. The
fibril with k ¼ 5 is magnified. All fibrils have the cross section shown in Figure 1.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1343
File Structure and Variables
Under the parent directory cellulose-builder, several files
should be present for each supported allomorph: a file
describing its asymmetric unit in fractional coordinates
(asy_I_alpha, asy_I_beta, asy_II, asy_III_I); a file describing the
unit cell parameters (dimensions_I_alpha, dimensions_I_beta,
dimensions_II, dimensions_III_I); a file listing atom labels
(atomtypes_I_alpha, atomtypes_I_beta, atomtypes_II, atomty-
pes_III_I); and a Bash script (I-alpha.sh, I-beta.sh, II.sh, III_I.sh).
Besides those allomorph-specific files, other three files should
be present: bGLC.top, cellulose-builder.sh, and input.inp. Below
we briefly describe each one of these files and their user speci-
fied variables and arguments.
Cellulose-builder.sh, I-alpha.sh, I-beta.sh, II.sh, III_I.sh
These files contain the program’s source code. Being a Bash
program (interpreted script), it can be run directly at the
prompt, but it depends on several tools to work properly.
Upon execution, the script will test for the tools needed and
issue error messages and instructions if one or more of the
Unix tools are not available. Besides the information provided
by the user on the command-line, this program also uses in-
formation contained in the files described below. Among
them, only input.inp can be regarded as an actual input file.
The other files contain pieces of information that were kept
separated from the main code to make it easier to the user to
alter those parameters.
input.inp
This file defines three variables, read by cellulose-builder.sh,
which specify the allomorph to be built, the implementation
of periodic boundary conditions, and the implementation of
periodic covalent bonding, namely PHASE, PBC and PCB_c,
respectively. The default values are:
PHASE ¼ I� BETA
PBC ¼ NONE
PCB c ¼ FALSE
As Bash syntax is being used, special attention must be
taken to not include spaces before or after the equal signs.
Variable PHASE determines which allomorph is going to be
built. Valid values are: I-BETA, I-ALPHA, II, III_I. Depending on
the value set for variable PHASE, the corresponding script
(among I-alpha.sh, I-beta.sh, II.sh, III_I.sh) will be sourced by
the main script cellulose-builder.sh.
Variable PBC controls the translational symmetry and thus
the final shape of the crystallite delivered, and applies to allo-
morphs Ib and II, as already discussed. Valid values are: NONE,
A, B, ALL.
PBC¼NONE: default value. Causes the program to build crys-
tallites with no translational symmetry along a nor along b
crystallographic directions. The crystallites, nevertheless, are
intrinsically endowed with translational symmetry in the c
direction (except for the elimination of a water molecule per
cellulose chain: an H atom at one terminus and an OH group
at the opposite terminus). When this option is used to build
allomorph Ib crystallites, for instance, only (100) and (010) crys-
tallographic surfaces are exposed, and crystallites are built in
shapes akin the one shown in Figure 3.
PBC¼A: Causes the program to build crystallites with trans-
lational symmetry along the a crystallographic direction. When
this option is used to build allomorph Ib crystallites, besides
(100) and (010), a (200) crystallographic surface is also
exposed, and crystallites are built in shapes akin the one
depicted in Figure 5A. As above, crystallites possess transla-
tional symmetry in the c direction as well.
PBC¼B: Causes the program to build crystallites with trans-
lational symmetry along the b crystallographic direction. When
this option is used to build allomorph Ib crystallites, besides
(100) and (010), a (020) crystallographic surface is also
exposed, and crystallites come in shapes akin the one shown
in Figure 5B. Crystallites possess translational symmetry in the
c direction as well.
PBC¼ALL: Causes the program to build crystallites with trans-
lational symmetry along both a and b crystallographic direc-
tions. Upon using this option to build allomorph Ib crystallites,
crystallographic planes (100), (010), (200), and (020) are exposed,
and the crystallites shapes are akin the one shown in Figure 5C.
The variable PCB_c determines whether the cellulose chains
composing the crystallite must be covalently bonded to their
adjacent periodic images along the c crystallographic direc-
tion. Valid values are: FALSE, TRUE. Both options can be used
with all crystal types delivered by cellulose-builder, namely,
parallelepipeds, elementary fibrils, and monolayers, as well as
with all supported allomorphs.
PCB_c¼FALSE: Cellulose chains are not periodically cova-
lently bonded to adjacent images, and thus come with an
additional water molecule per cellulose chain (one H atom at
one terminus, one OH group at the other) to properly com-
plete the atomic valence at the reducing and nonreducing
ends of the chains.
PCB_c¼TRUE: This option will prevent H and OH atoms from
being added at the chain ends. Therefore, the cellulose chain
ends will be ready and available to form glycosidic bonds with
adjacent periodic images. The information indicating that such
bonds should be formed is provided in the connectivity infor-
mation file (crystal.psf ) output by cellulose-builder.
asy_I_alpha, asy_I_beta, asy_II, asy_III_I
Each one of these files describes the asymmetric unit of the
corresponding cellulose allomorph. For instance, file asy_I_beta
contains 42 lines. Each line contains three real numbers sepa-
rated by spaces. They represent the fractional coordinates (x, y,
z, as reported by Nishiyama et al.[1]) of the atoms in the
cellulose Ib asymmetric unit, which is composed by two inde-
pendent anhydrous glucopyranose units, comprised of 21
atoms each. Therefore, the first 21 lines refer to the ‘‘origin’’
anhydroglucopyranose ring, whereas the last 21 lines refer to
the ‘‘center’’ ring. Only the coordinate values are present in the
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1344 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
asy_* files, the correspondent atoms labels are kept in sepa-
rate files named atomtypes_* (see below). The order of the
lines in both corresponding files (e.g. asy_I_beta and atomty-
pes_I_beta) must not be changed. However, the entries for
fractional coordinates in asy_* files can be replaced in case, for
instance, the user wishes to use some other set of crystallo-
graphic parameters.
Dimensions_I_alpha, dimensions_I_beta, dimensions_II,
dimensions_III_I
These files set the values of unit cell dimensions for their re-
spective cellulose allomorph, as reported by Nishiyama et al.[1–
4] Distances are given in Angstroms, and angles in radians (in
octave syntax). File dimensions_I_beta, for instance, reads:
a ¼ 7:784
b ¼ 8:201
c ¼ 10:380
alpha ¼ 90:0 � pi=180:0beta ¼ 90:0 � pi=180:0
gamma ¼ 96:5 � pi=180:0
The order of these lines is immaterial and the values could
be replaced if desired.
bGLC.top
This file is a symbolic link§ to the file top_all36_carb.rtf pro-
vided by MacKerell and coworkers,[26,27] which contains topol-
ogy information for the beta-D-glucopyranose residue, associ-
ated to the recent force field for carbohydrates by the same
authors. Other carbohydrate topology or force field can be
used by cellulose-builder by creating a new symbolic link:
$ ln � s my top file bGLC:top
Notice that the link name, bGLC.top, must be preserved.
Adopting another topology file will often imply in using differ-
ent atom labels (atom types), which in turn requires editing
the atomtypes_* files.
Atomtypes_I_alpha, atomtypes_I_beta, atomtypes_II,
atomtypes_III_I
These files specify the 21 atom types that compose an anhy-
droglucopyranose unit. The atom types must be provided
according to the topology file and in the exact same order of
the correspondent fractional coordinates in the respective
asy_* file (above). Editing files atomtypes_* is only necessary if
one is using an alternative topology file.
Benchmark
Although performance is not usually an issue in this type of
application, it is important to briefly discuss some performance
data, which are presented in Table 1. The program was run on
a GNU/Linux operating system (Ubuntu 8.10, kernel 2.6.27-7-
generic, i686). Memory and processing resources on that sys-
tem were 2GB of RAM and an AMD Athlon(tm) 64 X2 Dual
Core Processor 5200þ (CPU frequency: 2613.345 MHz, cache
size: 1024 Kb). The data demonstrates that, even with very
modest hardware resources, Cartesian coordinates for fibrils
such as those depicted in Figure 6 can be obtained within a
few seconds up to a couple of minutes, depending on the
fibril length, at a rate of 2.86 cellobiose units per second. Plot-
ting the wall clock time against the number of cellobiose units
reported on Table 1 and performing a linear regression yields
an almost perfectly linear relation with correlation coefficient
R2 ¼ 0.99992 (not shown).
Concluding Remarks
The program presented here, cellulose-builder, will serve as a
useful tool for scientists willing to perform MD simulations
and other computations on cellulose crystalline structures. It
provides an easy and automated means of generating Carte-
sian coordinates in PDB format for cellulose Ia, Ib, II, and IIIIallomorphs in a variety of different crystal shapes, ranging
from regular parallelepiped crystallites, to fibrils and mono-
layers. The program allows total control of the crystallite
dimensions and fibril length from the Bash command line. For
parallelepiped crystallites, different translational symmetries
are available enabling the use of periodic boundary conditions.
A number of crystallographic surfaces can be exposed such
that the user can build structures whose exposed surface areas
can have different degrees of hydrophobicity.
Table 1. Cellulose-builder running times for calculating and writing
Cartesian coordinates for elementary fibrils (Figs. 1 and 6) as a function
of the number of cellobiose units in each cellulose chain.
Cellobiose
units
Wall clock
time
User
CPU-time
System
CPU-time %CPU
5 5.75 2.93 0.81 65
25 11.28 9.08 1.56 94
50 20.06 16.80 2.72 97
100 37.51 32.21 4.86 98
500 178.06 154.60 23.11 99
The program was run on a PC with 2Gb RAM and an AMD Athlon(tm)
64 � 2 Dual Core Processor 5200 þ (CPU frequency: 2613.345 MHz,
cache size: 1024 Kb), under GNU/Linux operating system (Ubuntu 8.10,
kernal 2.6.27-7-generic, i686). Elapsed real time (wall clock), amount of
CPU-time that the process used directly (in user mode), amount of
CPU-time used by system on behalf of the process (in kernel mode),
and percentage of the CPU usage by the code, as provided by /usr/bin/
time. All times are in seconds. Percentage of CPU is just user þ system
times divided by the total running time. The resident memory usage
for the largest system was only 360 Mb.
§In computing, a symbolic link (also symlink or soft link) is a special type of filesupported by the POSIX operating-system standard that contains a referenceto another file or directory in the form of an absolute or relative path.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1345
The crystalline structures that can be built with cellulose-
builder may be further combined with other molecules, such
as solvent and proteins, using available codes for generating
initial configurations of molecular simulations.[22] Similarly, the
structures may be combined among themselves to create
more complex assemblies of cellulose and more elaborate
models of plant cell wall. Further developments of the pro-
gram are under way. Extending cellulose-builder to generate
crystalline structures of other glycans of putative relevance to
the study and design of cellulose-degrading enzymes, such
chitin[33] is also under consideration.
Acknowledgments
The authors thank Rodrigo L. Silveira for discussions.
Keywords: cellulose crystalline structures � starting configurationsfor simulations � plant cell wall elementary fibrils � hydrophobicand hydrophilic surfaces of cellulose � software � Bash program-
ming language
How to cite this article: T. C. F. Gomes, M. S. Skaf, J. Comput.
Chem. 2012, 33, 1338–1346. DOI: 10.1002/jcc.22959
Additional Supporting Information may be found in the
online version of this article.
[1] Y. Nishiyama, P. Langan, H. Chanzy, J. Am. Chem. Soc. 2002, 124, 9074.
[2] Y. Nishiyama, J. Sugiyama, H. Chanzy, P. Langan, J. Am. Chem. Soc.
2003, 125, 14300.
[3] P. Langan, Y. Nishiyama, H. Chanzy, Biomacromolecules 2001, 2, 401.
[4] M. Wada, H. Chanzy, Y. Nishiyama, P. Langan, Macromolecules 2004, 37,
8548.
[5] M. S. Baird, A. C. W. O’Sullivan, B. Banks, Cellulose 1998, 5, 89.
[6] R. J. Vietor, K. Mazeau, M. Lakin, Biopolymers 2000, 54, 342.
[7] J. F. Matthews, C. E. Skopec, P. E. Mason, P. Zuccato, R. W. Torget, J.
Sugiyama, M. E. Himmel, J. W. Brady, Carbohydr. Res. 2006, 341, 138.
[8] H. Miyamoto, M. Ago, C. Yamane, M. Seguchi, K. Ueda, K. Okajima, Car-
bohydr Res 2011, 346, 807.
[9] Y. Ding, M. E. Himmel, J. Agr. Food Chem. 2006, 54, 597.
[10] S. P. S. Chundawat, G. T. Beckham, M. E. Himmel, B. E. Dale, Annu. Rev.
Biom. Eng. 2011, 2, 121.
[11] L. Zhong, J. F. Matthews, M. F. Crowley, T. Rignall, C. Talon, J. M. Cleary,
R. C. Walker, G. Chukkapall, C. McCabe, M. R. Nimlos, C. L. Brooks, III,
M. E. Himmel, J. W. Brady, Cellulose 2008, 15, 261.
[12] L. Zhong, J. F. Matthews, P. I. Hansen, M. F. Crowley, J. M. Cleary, R. C.
Walker, M. R. Nimlos, C. L. Books, III, W. S. Adney, M. E. Himmel, J. W.
Brady, Carbohydr. Res. 2009, 344, 1984.
[13] C. M. Payne, M. E. Himmel, M. F. Crowley, G. T. Beckham, J. Phys. Chem.
Lett. 2011, 2, 1546.
[14] L. Petridis, X. Jiancong, M. F. Crowley, J. C. Smith, X. Cheng, In Compu-
tational Modeling in Lignocellulosic Biofuel Production. M. R. Nimlos,
M. F. Crowley (Eds), ACS Symposium Series, 2010, vol. 1052, Chapter
3, 55.
[15] H. M. Cho, A. S. Gross, J.-W. Chu, J. Am. Chem. Soc. 2011, 133, 14033.
[16] R. P. Swatloski, S. K. Spear, J. D. Holbrey, R. D. Roger, J. Am. Chem. Soc.
2002, 124, 4974.
[17] S. Zhu, Y. Wu, Q. Chen, Z. Yu, C. Wang, S. Jin, Y. Ding, G. Wu, G. Green
Chem. 2006, 8, 325.
[18] W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graphics 1996, 14, 33.
[19] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C.
Chipot, R. D. Skeel, L. Kale, K. Schulten, J. Comput. Chem. 2005, 26,
1781.
[20] B. R. Brooks, C. L. Brooks, III, A. D. Mackerell, Jr., L. Nilsson, R. J. Pet-
rella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch,
L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek,
W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pas-
tor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L.
Woodcock, X. Wu, W. Yang, D. M. York, M. Karplus, J. Comp. Chem.
2009, 30, 1545.
[21] D. A. Case, T. A. Darden, T. E. Cheatham, III, C. L. Simmerling, J.
Wang, R. E. Duke, R. Luo, R. C. Walker, W. Zhang, K. M. Merz, B. P.
Roberts, B. Wang, S. Hayik, A. Roitberg, G. Seabra, I. Kolossvai, K. F.
Wong, F. Paesani, J. Vanicek, J. Liu, X. Wu, S. R. Brozell, T. Stein-
brecher, H. Gohlke, Q. Cai, X. Ye, J. Wang, M.-J. Hsieh, G. Cui, D. R.
Roe, D. H. Mathews, M. G. Seetin, C. Sagui, V. Babin, T. Luchko, S.
Gusarov, A Kovalenko, P. A. Kollman, AMBER 11, University of Califor-
nia, San Francisco, 2010.
[22] L. Martı́nez, R. Andrade, E. G. Birgin, J. M. Martı́nez, J. Comput. Chem.
2009, 30, 2157.
[23] C. Newham, B. Rosenblatt, Learning the Bash Shell; O’Reilly & Associ-
ates: Cambridge, 1998.
[24] C. Albing, J. P. Vossen, C. Newham, Bash Cookbook; O’Reilly Media:
Sebastopol, 2007.
[25] A. Robbins, N. H. F. Beebe, Classic Shell Scripting; O’Reilly Media: Bei-
jing, 2005.
[26] O. Guvench, S. N. Greene, G. Kamath, J. W. Brady, R. M. Venable, R. W.
Pastor, A. D. MacKerell Jr., J. Comp. Chem. 2008, 29, 2543.
[27] O. Guvench, E. Hatcher, A. D. MacKerell Jr., J. Chem. Theory Comput.
2009, 5, 2353.
[28] J. P. Glusker, M. Lewis, M. Rossi, Crystal Structure Analysis for Chemists
and Biologists; Wiley-VCH: New York, 1994.
[29] M. F. C. Ladd, R. A. Palmer, Structure Determination by X-ray Crystal-
lography; Plenum Press: New York, 1994.
[30] T. Hahn, International Tables for Crystallography; Springer: Dordrecht,
2005; Volume A.
[31] B. B. Buchanan, W. Gruissem, R. L. Jones, Biochemistry and Molecular
Biology of Plants; American Society of Plant Physiologists: Rockville,
2000.
[32] Q. Li, S. Renneckar, Biomacromolecules 2011, 12, 650.
[33] G. Vaaje-Kolstad, B. Westereng, S. J. Horn, Z. Liu, H. Zhai, M. Sørlie, V.
G. H. Eijsink, Science 2010, 330, 219.
Received: 21 December 2011Revised: 5 February 2012Accepted: 7 February 2012Published online on 15 March 2012
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1346 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM