Cellulose-Builder: A toolkit for building crystalline structures of cellulose

9
Cellulose-Builder: A Toolkit for Building Crystalline Structures of Cellulose Thiago C. F. Gomes [a] and Munir S. Skaf* [a] Cellulose-builder is a user-friendly program that builds crystalline structures of cellulose of different sizes and geometries. The program generates Cartesian coordinates for all atoms of the specified structure in the Protein Data Bank format, suitable for using as starting configurations in molecular dynamics simulations and other calculations. Crystalline structures of cellulose polymorphs Ia,Ib, II, and III I of practically any size are readily constructed which includes parallelepipeds, plant cell wall cellulose elementary fibrils of any length, and monolayers. Periodic boundary conditions along the crystallographic directions are easily imposed. The program also generates atom connectivity file in PSF format, required by well-known simulation packages such as NAMD, CHARMM, and others. Cellulose-builder is based on the Bash programming language and should run on practically any Unix-like platform, demands very modest hardware, and is freely available for download from ftp://ftp.iqm.unicamp.br/pub/cellulose-builder. V C 2012 Wiley Periodicals, Inc. DOI: 10.1002/jcc.22959 Introduction Cellulose has recently attracted a great deal of attention due to its potential to become a carbon-neutral feedstock for renewable biofuels and chemicals. As a major component of vegetable biomass, cellulose is the most abundant organic compound on Earth’s biosphere. Many efforts have been devoted to comprehend the structure and properties of cellu- lose itself, [1–8] and to understand the microscopic nature of plant cell wall architecture and the molecular aspects associ- ated with its structural strength. [9] One of the major challenges in the development of a sus- tainable means of obtaining biofuels and other valuable chem- icals from lignocellulosic biomass is the recalcitrance of the cellulose to the action of degrading enzymes and chemi- cals. [10] The deconstruction of lignocellulose biopolymers into fermentable sugars by means of enzymatic saccharification is the most economically costly and scientifically challenging step of the currently available process for biochemical conver- sion of biomass into liquid fuels. Therefore, it is of fundamen- tal importance to gain further understanding of the mecha- nisms by which enzymes and auxiliary proteins recognize, bind to, and disrupt crystalline cellulose for subsequent cleav- age of the glycosidic bonds of the polysaccharide chains. To this aim, several molecular dynamics (MD) computer simula- tions have been reported recently which utilize crystalline cel- lulose tridimensional structures or surfaces as model sub- strates, in addition to the proteins of interest. [7,11–14] Very recently, MD simulations of the interactions between cellulose and ionic liquids have also been reported [15] in attempt to deepen our molecular level understanding of how ionic liquids dissolve crystalline cellulose. [16,17] These simulation studies share in common the need for the atomic coordinates of the cellulose crystal structures as initial configurations, which were independently constructed accord- ing to the structure of the desired substrate. As the scientific activity in this area is rapidly increasing, it would be very use- ful to theoretical and computational chemists and physicists alike to be able to readily construct crystalline structures of cellulose of different sizes and shapes. In this work, we present cellulose-builder, an automated solution for generating atomic coordinate files in the Protein Data Bank (PDB) format that can be readily used as input to simulate systems containing struc- tures of crystalline cellulose of practically any size, shape, and dimension. The code is freely available for download at ftp:// ftp.iqm.unicamp.br/pub/cellulose-builder. Cellulose-builder is written as a Bash script and relies on sev- eral well-established tools available on most Unix-like opera- tional system and on the VMD package [18] to provide an auto- mated, straightforward, and user friendly means of generating cellulose crystals of different shapes and sizes. Cellulose-builder only requires users to enter three integers (i, j, k) correspond- ing to the number of cellulose unit cells to be replicated in each crystallographic direction (a, b, c). The script will then perform all operations needed to build a crystal of the chosen size and will produce the initial configuration file in PDB for- mat. In addition, cellulose-builder will also output the corre- sponding atom connectivity (topology) information as a PSF file, which contains molecule-specific information required by some of the most popular MD simulation packages, including NAMD [19] and CHARMM, [20] and can be easily converted into the AMBER [21] file type prmtop. [a] T. C. F. Gomes, M. S. Skaf Institute of Chemistry, State University of Campinas –UNICAMP, Cx. P. 6154, Campinas, SP 13083-970, Brazil E-mail: [email protected] Contract/grant sponsor: Fapesp; contract/grant number: 08/56255-9; Contract/grant sponsor: CNPq; contract/grant number: 140978/2009-7. V C 2012 Wiley Periodicals, Inc. 1338 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

Transcript of Cellulose-Builder: A toolkit for building crystalline structures of cellulose

Cellulose-Builder: A Toolkit for Building CrystallineStructures of Cellulose

Thiago C. F. Gomes[a] and Munir S. Skaf*[a]

Cellulose-builder is a user-friendly program that builds

crystalline structures of cellulose of different sizes and

geometries. The program generates Cartesian coordinates

for all atoms of the specified structure in the Protein Data

Bank format, suitable for using as starting configurations in

molecular dynamics simulations and other calculations.

Crystalline structures of cellulose polymorphs Ia, Ib, II, andIIII of practically any size are readily constructed which

includes parallelepipeds, plant cell wall cellulose elementary

fibrils of any length, and monolayers. Periodic boundary

conditions along the crystallographic directions are easily

imposed. The program also generates atom connectivity file

in PSF format, required by well-known simulation packages

such as NAMD, CHARMM, and others. Cellulose-builder is

based on the Bash programming language and should run

on practically any Unix-like platform, demands very modest

hardware, and is freely available for download from

ftp://ftp.iqm.unicamp.br/pub/cellulose-builder. VC 2012 Wiley

Periodicals, Inc.

DOI: 10.1002/jcc.22959

Introduction

Cellulose has recently attracted a great deal of attention due

to its potential to become a carbon-neutral feedstock for

renewable biofuels and chemicals. As a major component of

vegetable biomass, cellulose is the most abundant organic

compound on Earth’s biosphere. Many efforts have been

devoted to comprehend the structure and properties of cellu-

lose itself,[1–8] and to understand the microscopic nature of

plant cell wall architecture and the molecular aspects associ-

ated with its structural strength.[9]

One of the major challenges in the development of a sus-

tainable means of obtaining biofuels and other valuable chem-

icals from lignocellulosic biomass is the recalcitrance of the

cellulose to the action of degrading enzymes and chemi-

cals.[10] The deconstruction of lignocellulose biopolymers into

fermentable sugars by means of enzymatic saccharification is

the most economically costly and scientifically challenging

step of the currently available process for biochemical conver-

sion of biomass into liquid fuels. Therefore, it is of fundamen-

tal importance to gain further understanding of the mecha-

nisms by which enzymes and auxiliary proteins recognize,

bind to, and disrupt crystalline cellulose for subsequent cleav-

age of the glycosidic bonds of the polysaccharide chains. To

this aim, several molecular dynamics (MD) computer simula-

tions have been reported recently which utilize crystalline cel-

lulose tridimensional structures or surfaces as model sub-

strates, in addition to the proteins of interest.[7,11–14] Very

recently, MD simulations of the interactions between cellulose

and ionic liquids have also been reported[15] in attempt to

deepen our molecular level understanding of how ionic liquids

dissolve crystalline cellulose.[16,17]

These simulation studies share in common the need for the

atomic coordinates of the cellulose crystal structures as initial

configurations, which were independently constructed accord-

ing to the structure of the desired substrate. As the scientific

activity in this area is rapidly increasing, it would be very use-

ful to theoretical and computational chemists and physicists

alike to be able to readily construct crystalline structures of

cellulose of different sizes and shapes. In this work, we present

cellulose-builder, an automated solution for generating atomic

coordinate files in the Protein Data Bank (PDB) format that can

be readily used as input to simulate systems containing struc-

tures of crystalline cellulose of practically any size, shape, and

dimension. The code is freely available for download at ftp://

ftp.iqm.unicamp.br/pub/cellulose-builder.

Cellulose-builder is written as a Bash script and relies on sev-

eral well-established tools available on most Unix-like opera-

tional system and on the VMD package[18] to provide an auto-

mated, straightforward, and user friendly means of generating

cellulose crystals of different shapes and sizes. Cellulose-builder

only requires users to enter three integers (i, j, k) correspond-

ing to the number of cellulose unit cells to be replicated in

each crystallographic direction (a, b, c). The script will then

perform all operations needed to build a crystal of the chosen

size and will produce the initial configuration file in PDB for-

mat. In addition, cellulose-builder will also output the corre-

sponding atom connectivity (topology) information as a PSF

file, which contains molecule-specific information required by

some of the most popular MD simulation packages, including

NAMD[19] and CHARMM,[20] and can be easily converted into

the AMBER[21] file type prmtop.

[a] T. C. F. Gomes, M. S. Skaf

Institute of Chemistry, State University of Campinas – UNICAMP, Cx. P. 6154,

Campinas, SP 13083-970, Brazil

E-mail: [email protected]

Contract/grant sponsor: Fapesp; contract/grant number: 08/56255-9;

Contract/grant sponsor: CNPq; contract/grant number: 140978/2009-7.

VC 2012 Wiley Periodicals, Inc.

1338 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM

SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

Given the recent interest in performing simulations contain-

ing model plant cell wall cellulose fibrils, which consist of 36

cellulose chains specifically arranged in space (Fig. 1),[9] the

software also enables one to build PDB and PSF files for ele-

mentary fibrils of arbitrary length. The cellulose structure files

created by cellulose-builder can be then combined with

coordinate files of other systems such as solvent and protein

molecules to construct solvated cellulose and cellulose-protein

complexes, using available packages for generating initial con-

figuration for molecular simulations, such as PACKMOL.[22] Ele-

mentary cellulose fibrils can also be readily combined with

each other and decorated with hemicellulose and lignin mole-

cules with PACKMOL to generate plant cell wall models of more

realistic architectures and other complex assemblies.

All tools upon which cellulose-builder relies to perform its

task are present by default on most Unix-like operating sys-

tems (OS) (e.g., sed, grep, nl, tr, cat, echo, bc, perl), or can be

readily obtained free of cost for Unix operating system (for

instance, octave, VMD, and psfgen).[18,19] Those are all well

established and trusted tools, most of them coded in the C

programming language. Nevertheless, good practices and

techniques are available in Bash programming as well,[23–25]

and many of them have been used to code cellulose-builder.

The inherently lower performance in terms of computer time

of Bash scripts in comparison to C programs, for instance, is

not an important issue in this case because generating initial

coordinates and connectivity information files by scripting will

still demand negligible amounts of computer time compared

to simulation time and analysis.

In ‘‘Workflow Overview’’ section, we present an overview of

the code’s workflow. In ‘‘Capabilities and Usage Examples’’ sec-

tion, we provide several usage examples. In ‘‘File Structure and

Variables’’ section, the file structure is described, and in ‘‘Bench-

mark’’ section, we present a summary of running times. Our con-

cluding remarks are presented in ‘‘Concluding Remarks’’ section.

Workflow Overview

In this section, we provide a

brief description of the proce-

dure adopted by cellulose-

builder. To build the PDB config-

uration files according to the

most recent crystallographic

structures of cellulose, we have

taken the data reported by Nish-

iyama et al.[1–4] for the asymmet-

ric units of cellulose Ia, Ib, II, andIIII allomorphs. The reported

structures were obtained from

synchrotron X-ray and neutron

diffraction experiments on both

hydrogenated and deuterated

cellulose fibers (except for cellu-

lose II, for which neutron diffrac-

tion experiment results are not

available[3]). This enabled precise

determination of the positions of

all hydrogen atoms and, thus, of the hydrogen bonding net-

work in different cellulose allomorphs. Cellulose Ib, II, and IIIIcrystal structures belong to the monoclinic P21 spatial group,

whereas cellulose Ia crystal structure belongs to the triclinic P1

space group. Cellulose-builder uses the crystallographic frac-

tional coordinates to calculate the Cartesian coordinates of the

hydrogen atoms in the final crystalline structure. That is, for

atoms with more than one reported crystallographic position

the program uses fractional coordinates for the position with

higher occupancy.

In Figure 2, we show a simplified scheme describing the

major steps taken by the program during its execution. Start-

ing from the crystallographic fractional coordinates, the P21space group symmetry operations are applied to the atoms in

the asymmetric unit to determine symmetry equivalent posi-

tions and generate fractional coordinates for the remainder

atoms within one unit cell.[28–30] For allomorph Ia, the previous

operation is not necessary since for its space group, P1, the

asymmetric unit coincides with the unit cell. The fractional

coordinates of the adjacent unit cells are then generated by

adding unity (one) n times to the original unit cell fractional

coordinates, in the appropriate manner, where n is an integer.

Fractional coordinates are then converted into Cartesian

coordinates[28–30] (see Supporting Information) using the cell

dimensions reported by Nishiyama et al.[1–4]). At the end of

this stage, a directory named crystal is created which will store

the initial configuration file corresponding to desired crystallite

both in XYZ and PDB formats (files crystal.xyz and crystal.pdb,

respectively), among other relevant files created during the

process.

A connectivity information file for the crystallite is also pro-

vided in PSF format (crystal.psf ). This file is suited for use with

NAMD[19] and CHARMM[20] packages. The topology currently

implemented is meant for use with the carbohydrate force

field by MacKerell and coworkers.[26,27] Topology files suitable

Figure 1. Model proposed by Ding and Himmel[9] for maize primary cell wall elementary fibril, as seen from

their nonreducing ends. This Ib elementary fibril possesses 36 cellulose chains. [Color figure can be viewed in

the online issue, which is available at wileyonlinelibrary.com.]

WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES

Journal of Computational Chemistry 2012, 33, 1338–1346 1339

for other simulation packages

and force fields can be readily

obtained from the Cartesian

coordinates or PDB files output

by cellulose-builder.

Capabilities and UsageExamples

Basically, the program allows one

to build three distinct types of

crystals, which we herein denote:

parallelepipeds, (plant cell wall) el-

ementary fibrils, and monolayers.

Descriptions and instructions for

generating each crystal type are

provided below. Cellulose-builder

also allows for implementation of

periodic boundary conditions

(pbc) along any crystallographic

direction. Being a Bash program, it

must be run on a Bash shell prompt, which is the default shell on

manymodern Linux systems and easily available on other Unix-like

OS. Under Windows OS, cellulose-builder can be run within

Cygwin.*

We exemplify how to apply the capabilities implemented in

cellulose-builder using the Ib cellulose allomorph. Building

structures for other allomorphs readily follows. Pertinent

remarks are included when referring to allomorphs other than

Ib. The glucan chains in cellulose Ib containing the glucopyra-

nose units were labeled ‘‘origin’’ and ‘‘center’’ chains, referring

to their positions in the unit cell. The same is true for cellulose

II, in spite the fact that this allomorph’s origin and center

chains run antiparallel relative to each other. In cellulose Iaand IIII there is only one type of chain. These features reflect

in which options are available for each allomorph. To properly

set the allomorph type, one must edit the variable PHASE in

the text file input.inp, which contains only three lines (see ‘‘File

Structure and Variables’’ section, Files).

Parallelepiped crystals

We denote parallelepiped crystals those obtained by simple

replication of the cellulose unit cell. This crystal shape is avail-

able for all supported allomorphs. For building parallelepiped

crystal, the user must specify three integer numbers (i, j, k) as

arguments upon calling cellulose-builder on the command

line. We provide an example (regard ‘‘$’’ as the Bash prompt,

and our current working directory as the cellulose-builder par-

ent directory):

$ :=cellulose-builder 4 4 5 (example 1)

In this case, the program will set up a crystal 4 unit cells

wide in the a and b crystallographic directions and five unit

cells wide in the c crystallographic direction, which is parallel

to the cellulose chains. The resulting crystal, shown in Figure

3, is comprised of 25 cellulose chains, each one bearing five

cellobiose units.

Using the parallelepiped method one can build crystallites

exposing different proportions of hydrophobic and hydrophilic

surface areas. In Figure 4, we exemplify this possibility by

showing two cellulose Ib blade-shaped crystallites, exposing

mostly one type or the other of surfaces. For instance, to

obtain a Ib crystallite with exposed surface area predominantly

hydrophilic (Fig. 4A):

$ :=cellulose-bulider 10 2 5 (example 2)

Instead, for a Ib crystallite with predominantly hydrophobic

exposed surface area (Figure 4B):

$ :=cellulose-builder 2 10 5 (example 3)

Periodic boundary conditions on parallelepiped crystals

To periodically replicate a given crystallite, one must ensure

that the crystallite has perfectly fitting edges so that the sys-

tem exhibits translational symmetry. Cellulose-builder supports

parallelepiped crystallites possessing translational symmetry

with respect to the three axes shown in Figure 3. The user

may wish to implement pbc to one, two or all three crystallo-

graphic directions (a, b, c). This can be easily accomplished by

editing the variable PBC in the text file input.inp and run ./cel-

lulose-builder at the prompt, as shown by examples 1 to 3

above, for instance. Let us discuss how to implement pbc

along directions a and b separately from direction c.

Figure 2. Cellulose-builder simplified workflow. Given the experimentally determined space group (P21 for cel-

lulose Ib, II and IIII) and experimental fractional (atomic) coordinates, the program determines the symmetry

equivalent positions of all other atoms within the unit cell. The program then replicates unit cells according to

user input requirements, exploring the convenience of working in fractional coordinates for such task. After rep-

lication, fractional coordinates are transformed into Cartesian coordinates using experimental cell dimensions to

yield a file in XYZ format. A major editing is then performed to achieve the initial configuration file in PDB for-

mat, suited for common MD simulation packages. The program also writes a script for psfgen and executes it,

yielding a connectivity information file, in PSF format, meant to model the cellulose crystal using the CHARMM

force field for carbohydrates.[26,27]

*A Linux-like environment for Windows that ports software running on POSIXsystems (such as Linux, BSD, and Unix systems) onto Windows. (http://www.cygwin.com).

SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

1340 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM

For pbc along the crystallographic a direction only one

must set PBC¼A. For instance, with PBC variable set to PBC¼A,

the following command:

$ :=cellulose-builder 4 5 5 (example 4)

yields the crystallite shown in Figure 5A.

For obtaining translational symmetry along the crystallo-

graphic b direction, one must set PBC¼B in file input.inp. With

PBC¼B in file input.inp, the command

$ :=cellulose-builder 5 4 5 (example 5)

produces the crystallite shown in Figure 5B.

To obtain a structure periodically replicated along both crys-

tallographic a and b directions, the input.inp file must be

edited to set PBC¼ALL. The crystallite shown in Figure 5C was

obtained from the command below with PBC¼ALL:

$ :=cellulose-builder 4 4 5 (example 6)

Figure 3. Cellulose Ib crystallite generated by cellulose-builder as seen from its nonreducing ends (left) and rotated by 90� (right). For the sake of consis-

tency with notation used by other authors,[7] we have adopted the same viewpoint as those authors for showing the cellulose crystallite. Crystallographic

faces are indicated by their corresponding Miller indices. Origin and center chain layers, and unit cell axes are indicated as well. Cellulose chains are parallel

to the c direction.

Figure 4. Different surfaces exposed by two different cellulose Ib blade-shaped crystallites. Left: predominantly hydrophilic (010) surfaces are exposed.

Right: predominantly hydrophobic (100) surfaces are exposed. Top and bottom images represent the same crystallite seen from different viewpoints on

VMD’s X-window OpenGL display.

WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES

Journal of Computational Chemistry 2012, 33, 1338–1346 1341

Imposing pbc along crystallographic directions a or b only

makes sense to allomorphs Ib and II, since for Ia and IIII, the

unit cell is such that crystallites automatically have transla-

tional symmetry in all crystallographic directions.

Regarding the crystallographic c direction, no special action is

needed to endow crystallites with translational symmetry along

that direction, since any replication of cellulose unit cell yields a

crystallite already possessing that property for any allomorph.

As a consequence, all cellulose crystallites generated by cellu-

lose-builder will automatically possess translational symmetry in

the crystallographic c direction. Indeed, the default PBC value in

input.inp file is PBC¼NONE. The crystallites shown in Figures 3

and 4 were built with PBC¼NONE. Setting the Bash variable

PBC to NONE does not mean that the resulting crystallite will

have no translational symmetry, but instead that no subsequent

procedure is necessary to confer further translational symmetry

to the crystallite after its construction. However, with this

option, a hydrogen atom (H) and a hydroxyl group (OH) will be

respectively added to the opposing end points of every chain

ensuring there are no dangling bonds.†

Very often in computer simulations of crystalline cellulose,

one would like to work with truly infinitely periodic systems

along the c direction, which requires a perfectly matching

bond between reducing and nonreducing ends of replicated

chains along the c direction.‡ To control periodic covalent

bonding in the final connectivity information file (crystal.psf )

delivered by cellulose-builder one must edit the third line of

the file input.inp, which reads:

PCB c ¼ FALSE

Setting the Bash variable PCB_c to FALSE in the input.inp

file causes no periodic covalent bonding along c direction, and

is the default. To include periodic covalent bonding in the final

connectivity file one must set:

PCB c ¼ TRUE

In addition to parallelepiped crystallites, this option can be

also applied to the other two crystal types provided by cellu-

lose-builder, i.e. elementary fibrils and monolayers, as

described next.

Elementary fibrils

Cellulose-builder can also build cellulose elementary fibrils of any

length from allomorphs Ia, Ib, and II. Cellulose elementary fibrils

from allomorph Ib possess the cross-section depicted in Figure

1. The primary fibril with such a cross section was constructed

by carving out a larger Ib parallelepiped crystallite (see Support-

ing Information). This disposition of chains corresponds to a

recently proposed model for the elementary cellulose fibrils of

maize cell wall, free of hemicelluloses, lignins, and pectins.[9] The

model is likely to be applicable to several other species of plants

since the terminal enzymatic complex that synthesizes cellulose

elementary fibrils at maize cell membranes is similar to that of

other angiosperms.[31] Depending on the source tissue and orga-

nism, cellulose chains in the elementary fibrils may have from a

few tens to several hundreds of cellobiose residues.

To build an elementary fibril, the string fibril must be passed

as first argument in the command line, whereas the number

of cellobiose units in the chains that compose the elementary

fibril (i.e. the degree of polymerization) is specified by an inte-

ger k as second argument:

$ :=cellulose-builder fibril k (example 7)

Cellulose Ib elementary fibrils of several lengths are shown

in Figure 6, for k ¼ 5, 25, 50, 100, and 500. If one wishes to

impose pbc along the chains direction, periodic covalent

bonding can be implemented by setting PBC_C¼TRUE, as

discussed above. In the case of elementary fibrils, pbc are

supported in the c direction only. Fibrils with arbitrary cross

sections, different from the maize cell wall cellulose elemen-

tary fibril shown in Figure 1, can also be readily constructed

(see Supporting Information). Elementary fibrils can be fur-

ther arranged to assemble complex hyperstructures as mod-

els for plant cell walls or simply solvated by molecular sol-

vents (Supporting Information) using software such as

PACKMOL.[22]

Monolayers

Cellulose Ib crystal structure consists of alternating layers of or-

igin and center cellulose chains, with no hydrogen bonds

between layers.[1] Recent experimental studies have shown cel-

lulose elementary fibrils from woody material to undergo

delamination (or peeling) along its (200) plane after (2,2,6,6-

tetramethylpiperidin-1-yl)oxyl-mediated oxidation and intensive

sonication.[32] Those finding motivated us to include an option

for generating monolayers. So, the command:

$ :=cellulose-builder origin j k (example 8)

will build a monolayer composed of j cellulose origin chains

containing k cellobiose residues each, whereas,

$ :=cellulose-builder center j k (example 9)

yields a similar monolayer composed of center chains. Exam-

ples 9 and 10 are also valid for obtaining monolayers com-

posed of origin or center chains from allomorph II. Since allo-

morph Ia has only one type of chain, the equivalent command

line for obtaining a monolayer of chains from cellulose Ia is

$ :=cellulose-builder monolayer j k (example 10)

†Indeed, since for any allomorph the unit cell is composed of anhydrogluco-pyranose units, any replication of cellulose unit cell yields a crystallite possess-ing translational symmetry along the c direction. Nevertheless, in the finalsteps, missing atoms at the extremities of cellulose chains are added: onehydrogen atom is added at one terminus, one OH group at the other terminusof each cellulose chain. These are the only atoms in the whole crystallitewhose coordinates are guessed for (except for allomorph II whose hydrogensHO2, HO3, and HO6 positions have not been determined experimentally[3]

due to lack of neutron diffraction data and so have to be guessed). Thus, trans-lational symmetry along the c direction is actually conditioned to the elimina-tion of those added atoms (one water molecule per cellulose chain).‡Covalent bonding between atoms of different chains is usually set up in theinput files (topology) of molecular simulation program suites.

SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

1342 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM

Cellulose IIII has only one type of chain as well, but

there is more than one manner of producing monolayers

from its structure. Therefore, monolayers are not automati-

cally supported for allomorph IIII. Nevertheless, one can

always build crystallites of arbitrary shapes for any of the

allomorphs by a simple method provided in Supporting

Information.

Similarly, periodic boundary conditions can be implemented

for monolayers along the chains direction via periodic covalent

bonding by setting PBC_C¼TRUE in file input.inp.

Figure 5. Cellulose Ib crystallites suited for pbc in a (left), b (middle) and both a and b (right) crystallographic directions. Origin and center chain layers

are indicated, as well as crystallographic directions and Miller indices. The notation adopted for Miller indices is the same adopted by Matthews et al.,[7] so

faces where center chains are in the surface are indicated by (200) or (020) to reflect their positions half-way the unit cell. Colored lines within crystallites

indicate the crystallographic directions along which the crystallite is endowed with translational symmetry.

Figure 6. Cellulose Ib elementary fibrils of different lengths, possessing k ¼ 5, 25, 50, 100, and 500 cellobiose units, generated with cellulose-builder. The

fibril with k ¼ 5 is magnified. All fibrils have the cross section shown in Figure 1.

WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES

Journal of Computational Chemistry 2012, 33, 1338–1346 1343

File Structure and Variables

Under the parent directory cellulose-builder, several files

should be present for each supported allomorph: a file

describing its asymmetric unit in fractional coordinates

(asy_I_alpha, asy_I_beta, asy_II, asy_III_I); a file describing the

unit cell parameters (dimensions_I_alpha, dimensions_I_beta,

dimensions_II, dimensions_III_I); a file listing atom labels

(atomtypes_I_alpha, atomtypes_I_beta, atomtypes_II, atomty-

pes_III_I); and a Bash script (I-alpha.sh, I-beta.sh, II.sh, III_I.sh).

Besides those allomorph-specific files, other three files should

be present: bGLC.top, cellulose-builder.sh, and input.inp. Below

we briefly describe each one of these files and their user speci-

fied variables and arguments.

Cellulose-builder.sh, I-alpha.sh, I-beta.sh, II.sh, III_I.sh

These files contain the program’s source code. Being a Bash

program (interpreted script), it can be run directly at the

prompt, but it depends on several tools to work properly.

Upon execution, the script will test for the tools needed and

issue error messages and instructions if one or more of the

Unix tools are not available. Besides the information provided

by the user on the command-line, this program also uses in-

formation contained in the files described below. Among

them, only input.inp can be regarded as an actual input file.

The other files contain pieces of information that were kept

separated from the main code to make it easier to the user to

alter those parameters.

input.inp

This file defines three variables, read by cellulose-builder.sh,

which specify the allomorph to be built, the implementation

of periodic boundary conditions, and the implementation of

periodic covalent bonding, namely PHASE, PBC and PCB_c,

respectively. The default values are:

PHASE ¼ I� BETA

PBC ¼ NONE

PCB c ¼ FALSE

As Bash syntax is being used, special attention must be

taken to not include spaces before or after the equal signs.

Variable PHASE determines which allomorph is going to be

built. Valid values are: I-BETA, I-ALPHA, II, III_I. Depending on

the value set for variable PHASE, the corresponding script

(among I-alpha.sh, I-beta.sh, II.sh, III_I.sh) will be sourced by

the main script cellulose-builder.sh.

Variable PBC controls the translational symmetry and thus

the final shape of the crystallite delivered, and applies to allo-

morphs Ib and II, as already discussed. Valid values are: NONE,

A, B, ALL.

PBC¼NONE: default value. Causes the program to build crys-

tallites with no translational symmetry along a nor along b

crystallographic directions. The crystallites, nevertheless, are

intrinsically endowed with translational symmetry in the c

direction (except for the elimination of a water molecule per

cellulose chain: an H atom at one terminus and an OH group

at the opposite terminus). When this option is used to build

allomorph Ib crystallites, for instance, only (100) and (010) crys-

tallographic surfaces are exposed, and crystallites are built in

shapes akin the one shown in Figure 3.

PBC¼A: Causes the program to build crystallites with trans-

lational symmetry along the a crystallographic direction. When

this option is used to build allomorph Ib crystallites, besides

(100) and (010), a (200) crystallographic surface is also

exposed, and crystallites are built in shapes akin the one

depicted in Figure 5A. As above, crystallites possess transla-

tional symmetry in the c direction as well.

PBC¼B: Causes the program to build crystallites with trans-

lational symmetry along the b crystallographic direction. When

this option is used to build allomorph Ib crystallites, besides

(100) and (010), a (020) crystallographic surface is also

exposed, and crystallites come in shapes akin the one shown

in Figure 5B. Crystallites possess translational symmetry in the

c direction as well.

PBC¼ALL: Causes the program to build crystallites with trans-

lational symmetry along both a and b crystallographic direc-

tions. Upon using this option to build allomorph Ib crystallites,

crystallographic planes (100), (010), (200), and (020) are exposed,

and the crystallites shapes are akin the one shown in Figure 5C.

The variable PCB_c determines whether the cellulose chains

composing the crystallite must be covalently bonded to their

adjacent periodic images along the c crystallographic direc-

tion. Valid values are: FALSE, TRUE. Both options can be used

with all crystal types delivered by cellulose-builder, namely,

parallelepipeds, elementary fibrils, and monolayers, as well as

with all supported allomorphs.

PCB_c¼FALSE: Cellulose chains are not periodically cova-

lently bonded to adjacent images, and thus come with an

additional water molecule per cellulose chain (one H atom at

one terminus, one OH group at the other) to properly com-

plete the atomic valence at the reducing and nonreducing

ends of the chains.

PCB_c¼TRUE: This option will prevent H and OH atoms from

being added at the chain ends. Therefore, the cellulose chain

ends will be ready and available to form glycosidic bonds with

adjacent periodic images. The information indicating that such

bonds should be formed is provided in the connectivity infor-

mation file (crystal.psf ) output by cellulose-builder.

asy_I_alpha, asy_I_beta, asy_II, asy_III_I

Each one of these files describes the asymmetric unit of the

corresponding cellulose allomorph. For instance, file asy_I_beta

contains 42 lines. Each line contains three real numbers sepa-

rated by spaces. They represent the fractional coordinates (x, y,

z, as reported by Nishiyama et al.[1]) of the atoms in the

cellulose Ib asymmetric unit, which is composed by two inde-

pendent anhydrous glucopyranose units, comprised of 21

atoms each. Therefore, the first 21 lines refer to the ‘‘origin’’

anhydroglucopyranose ring, whereas the last 21 lines refer to

the ‘‘center’’ ring. Only the coordinate values are present in the

SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

1344 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM

asy_* files, the correspondent atoms labels are kept in sepa-

rate files named atomtypes_* (see below). The order of the

lines in both corresponding files (e.g. asy_I_beta and atomty-

pes_I_beta) must not be changed. However, the entries for

fractional coordinates in asy_* files can be replaced in case, for

instance, the user wishes to use some other set of crystallo-

graphic parameters.

Dimensions_I_alpha, dimensions_I_beta, dimensions_II,

dimensions_III_I

These files set the values of unit cell dimensions for their re-

spective cellulose allomorph, as reported by Nishiyama et al.[1–

4] Distances are given in Angstroms, and angles in radians (in

octave syntax). File dimensions_I_beta, for instance, reads:

a ¼ 7:784

b ¼ 8:201

c ¼ 10:380

alpha ¼ 90:0 � pi=180:0beta ¼ 90:0 � pi=180:0

gamma ¼ 96:5 � pi=180:0

The order of these lines is immaterial and the values could

be replaced if desired.

bGLC.top

This file is a symbolic link§ to the file top_all36_carb.rtf pro-

vided by MacKerell and coworkers,[26,27] which contains topol-

ogy information for the beta-D-glucopyranose residue, associ-

ated to the recent force field for carbohydrates by the same

authors. Other carbohydrate topology or force field can be

used by cellulose-builder by creating a new symbolic link:

$ ln � s my top file bGLC:top

Notice that the link name, bGLC.top, must be preserved.

Adopting another topology file will often imply in using differ-

ent atom labels (atom types), which in turn requires editing

the atomtypes_* files.

Atomtypes_I_alpha, atomtypes_I_beta, atomtypes_II,

atomtypes_III_I

These files specify the 21 atom types that compose an anhy-

droglucopyranose unit. The atom types must be provided

according to the topology file and in the exact same order of

the correspondent fractional coordinates in the respective

asy_* file (above). Editing files atomtypes_* is only necessary if

one is using an alternative topology file.

Benchmark

Although performance is not usually an issue in this type of

application, it is important to briefly discuss some performance

data, which are presented in Table 1. The program was run on

a GNU/Linux operating system (Ubuntu 8.10, kernel 2.6.27-7-

generic, i686). Memory and processing resources on that sys-

tem were 2GB of RAM and an AMD Athlon(tm) 64 X2 Dual

Core Processor 5200þ (CPU frequency: 2613.345 MHz, cache

size: 1024 Kb). The data demonstrates that, even with very

modest hardware resources, Cartesian coordinates for fibrils

such as those depicted in Figure 6 can be obtained within a

few seconds up to a couple of minutes, depending on the

fibril length, at a rate of 2.86 cellobiose units per second. Plot-

ting the wall clock time against the number of cellobiose units

reported on Table 1 and performing a linear regression yields

an almost perfectly linear relation with correlation coefficient

R2 ¼ 0.99992 (not shown).

Concluding Remarks

The program presented here, cellulose-builder, will serve as a

useful tool for scientists willing to perform MD simulations

and other computations on cellulose crystalline structures. It

provides an easy and automated means of generating Carte-

sian coordinates in PDB format for cellulose Ia, Ib, II, and IIIIallomorphs in a variety of different crystal shapes, ranging

from regular parallelepiped crystallites, to fibrils and mono-

layers. The program allows total control of the crystallite

dimensions and fibril length from the Bash command line. For

parallelepiped crystallites, different translational symmetries

are available enabling the use of periodic boundary conditions.

A number of crystallographic surfaces can be exposed such

that the user can build structures whose exposed surface areas

can have different degrees of hydrophobicity.

Table 1. Cellulose-builder running times for calculating and writing

Cartesian coordinates for elementary fibrils (Figs. 1 and 6) as a function

of the number of cellobiose units in each cellulose chain.

Cellobiose

units

Wall clock

time

User

CPU-time

System

CPU-time %CPU

5 5.75 2.93 0.81 65

25 11.28 9.08 1.56 94

50 20.06 16.80 2.72 97

100 37.51 32.21 4.86 98

500 178.06 154.60 23.11 99

The program was run on a PC with 2Gb RAM and an AMD Athlon(tm)

64 � 2 Dual Core Processor 5200 þ (CPU frequency: 2613.345 MHz,

cache size: 1024 Kb), under GNU/Linux operating system (Ubuntu 8.10,

kernal 2.6.27-7-generic, i686). Elapsed real time (wall clock), amount of

CPU-time that the process used directly (in user mode), amount of

CPU-time used by system on behalf of the process (in kernel mode),

and percentage of the CPU usage by the code, as provided by /usr/bin/

time. All times are in seconds. Percentage of CPU is just user þ system

times divided by the total running time. The resident memory usage

for the largest system was only 360 Mb.

§In computing, a symbolic link (also symlink or soft link) is a special type of filesupported by the POSIX operating-system standard that contains a referenceto another file or directory in the form of an absolute or relative path.

WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES

Journal of Computational Chemistry 2012, 33, 1338–1346 1345

The crystalline structures that can be built with cellulose-

builder may be further combined with other molecules, such

as solvent and proteins, using available codes for generating

initial configurations of molecular simulations.[22] Similarly, the

structures may be combined among themselves to create

more complex assemblies of cellulose and more elaborate

models of plant cell wall. Further developments of the pro-

gram are under way. Extending cellulose-builder to generate

crystalline structures of other glycans of putative relevance to

the study and design of cellulose-degrading enzymes, such

chitin[33] is also under consideration.

Acknowledgments

The authors thank Rodrigo L. Silveira for discussions.

Keywords: cellulose crystalline structures � starting configurationsfor simulations � plant cell wall elementary fibrils � hydrophobicand hydrophilic surfaces of cellulose � software � Bash program-

ming language

How to cite this article: T. C. F. Gomes, M. S. Skaf, J. Comput.

Chem. 2012, 33, 1338–1346. DOI: 10.1002/jcc.22959

Additional Supporting Information may be found in the

online version of this article.

[1] Y. Nishiyama, P. Langan, H. Chanzy, J. Am. Chem. Soc. 2002, 124, 9074.

[2] Y. Nishiyama, J. Sugiyama, H. Chanzy, P. Langan, J. Am. Chem. Soc.

2003, 125, 14300.

[3] P. Langan, Y. Nishiyama, H. Chanzy, Biomacromolecules 2001, 2, 401.

[4] M. Wada, H. Chanzy, Y. Nishiyama, P. Langan, Macromolecules 2004, 37,

8548.

[5] M. S. Baird, A. C. W. O’Sullivan, B. Banks, Cellulose 1998, 5, 89.

[6] R. J. Vietor, K. Mazeau, M. Lakin, Biopolymers 2000, 54, 342.

[7] J. F. Matthews, C. E. Skopec, P. E. Mason, P. Zuccato, R. W. Torget, J.

Sugiyama, M. E. Himmel, J. W. Brady, Carbohydr. Res. 2006, 341, 138.

[8] H. Miyamoto, M. Ago, C. Yamane, M. Seguchi, K. Ueda, K. Okajima, Car-

bohydr Res 2011, 346, 807.

[9] Y. Ding, M. E. Himmel, J. Agr. Food Chem. 2006, 54, 597.

[10] S. P. S. Chundawat, G. T. Beckham, M. E. Himmel, B. E. Dale, Annu. Rev.

Biom. Eng. 2011, 2, 121.

[11] L. Zhong, J. F. Matthews, M. F. Crowley, T. Rignall, C. Talon, J. M. Cleary,

R. C. Walker, G. Chukkapall, C. McCabe, M. R. Nimlos, C. L. Brooks, III,

M. E. Himmel, J. W. Brady, Cellulose 2008, 15, 261.

[12] L. Zhong, J. F. Matthews, P. I. Hansen, M. F. Crowley, J. M. Cleary, R. C.

Walker, M. R. Nimlos, C. L. Books, III, W. S. Adney, M. E. Himmel, J. W.

Brady, Carbohydr. Res. 2009, 344, 1984.

[13] C. M. Payne, M. E. Himmel, M. F. Crowley, G. T. Beckham, J. Phys. Chem.

Lett. 2011, 2, 1546.

[14] L. Petridis, X. Jiancong, M. F. Crowley, J. C. Smith, X. Cheng, In Compu-

tational Modeling in Lignocellulosic Biofuel Production. M. R. Nimlos,

M. F. Crowley (Eds), ACS Symposium Series, 2010, vol. 1052, Chapter

3, 55.

[15] H. M. Cho, A. S. Gross, J.-W. Chu, J. Am. Chem. Soc. 2011, 133, 14033.

[16] R. P. Swatloski, S. K. Spear, J. D. Holbrey, R. D. Roger, J. Am. Chem. Soc.

2002, 124, 4974.

[17] S. Zhu, Y. Wu, Q. Chen, Z. Yu, C. Wang, S. Jin, Y. Ding, G. Wu, G. Green

Chem. 2006, 8, 325.

[18] W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graphics 1996, 14, 33.

[19] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C.

Chipot, R. D. Skeel, L. Kale, K. Schulten, J. Comput. Chem. 2005, 26,

1781.

[20] B. R. Brooks, C. L. Brooks, III, A. D. Mackerell, Jr., L. Nilsson, R. J. Pet-

rella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch,

L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek,

W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pas-

tor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L.

Woodcock, X. Wu, W. Yang, D. M. York, M. Karplus, J. Comp. Chem.

2009, 30, 1545.

[21] D. A. Case, T. A. Darden, T. E. Cheatham, III, C. L. Simmerling, J.

Wang, R. E. Duke, R. Luo, R. C. Walker, W. Zhang, K. M. Merz, B. P.

Roberts, B. Wang, S. Hayik, A. Roitberg, G. Seabra, I. Kolossvai, K. F.

Wong, F. Paesani, J. Vanicek, J. Liu, X. Wu, S. R. Brozell, T. Stein-

brecher, H. Gohlke, Q. Cai, X. Ye, J. Wang, M.-J. Hsieh, G. Cui, D. R.

Roe, D. H. Mathews, M. G. Seetin, C. Sagui, V. Babin, T. Luchko, S.

Gusarov, A Kovalenko, P. A. Kollman, AMBER 11, University of Califor-

nia, San Francisco, 2010.

[22] L. Martı́nez, R. Andrade, E. G. Birgin, J. M. Martı́nez, J. Comput. Chem.

2009, 30, 2157.

[23] C. Newham, B. Rosenblatt, Learning the Bash Shell; O’Reilly & Associ-

ates: Cambridge, 1998.

[24] C. Albing, J. P. Vossen, C. Newham, Bash Cookbook; O’Reilly Media:

Sebastopol, 2007.

[25] A. Robbins, N. H. F. Beebe, Classic Shell Scripting; O’Reilly Media: Bei-

jing, 2005.

[26] O. Guvench, S. N. Greene, G. Kamath, J. W. Brady, R. M. Venable, R. W.

Pastor, A. D. MacKerell Jr., J. Comp. Chem. 2008, 29, 2543.

[27] O. Guvench, E. Hatcher, A. D. MacKerell Jr., J. Chem. Theory Comput.

2009, 5, 2353.

[28] J. P. Glusker, M. Lewis, M. Rossi, Crystal Structure Analysis for Chemists

and Biologists; Wiley-VCH: New York, 1994.

[29] M. F. C. Ladd, R. A. Palmer, Structure Determination by X-ray Crystal-

lography; Plenum Press: New York, 1994.

[30] T. Hahn, International Tables for Crystallography; Springer: Dordrecht,

2005; Volume A.

[31] B. B. Buchanan, W. Gruissem, R. L. Jones, Biochemistry and Molecular

Biology of Plants; American Society of Plant Physiologists: Rockville,

2000.

[32] Q. Li, S. Renneckar, Biomacromolecules 2011, 12, 650.

[33] G. Vaaje-Kolstad, B. Westereng, S. J. Horn, Z. Liu, H. Zhai, M. Sørlie, V.

G. H. Eijsink, Science 2010, 330, 219.

Received: 21 December 2011Revised: 5 February 2012Accepted: 7 February 2012Published online on 15 March 2012

SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG

1346 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM