Recognition of docking sites on a protein using β-shape based on Voronoi diagram of atoms

13
Recognition of docking sites on a protein using b-shape based on Voronoi diagram of atoms Deok-Soo Kim a,b, * , Cheol-Hyung Cho b , Donguk Kim b , Youngsong Cho b a Department of Industrial Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Korea b Voronoi Diagram Research Center, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Korea Received 14 June 2005; accepted 22 November 2005 Abstract A protein consists of atoms. Given a protein, the automatic recognition of depressed regions on the surface of the protein, often called docking sites or pockets, is important for the analysis of interaction between a protein and a ligand and facilitates fast development of new drugs. Presented in this paper is a geometric approach for the detection of docking sites using b-shape which is based on the Voronoi diagram for atoms in Euclidean distance metric. We first propose a geometric construct called a b-shape which represents the proximity among atoms on the surface of a protein. Then, using the b-shape, which takes the size differences among different atoms into account, we present an algorithm to extract the pockets for the possible docking site on the surface of a protein. q 2005 Elsevier Ltd. All rights reserved. Keywords: Pocket; Binding sites; Docking; Voronoi diagram of spheres; b-shape; Protein interaction; Drug design 1. Introduction Molecules such as proteins, DNA, and RNA consist of atoms. Given the atomic complexes of these molecules, analyzing interactions between them is important for under- standing their biological functions. The interaction between a protein and a small molecule is also one of the most important issues in designing new drugs. The study of molecular interactions, such as the docking of a protein with a ligand or protein folding, can be approached from a physicochemical and/or a geometrical point of view [52]. While the physicochemical approach is to find regions on the surface of a protein which minimize the potential energy between two molecules, the geometric approach is to determine whether two molecules have geometrically mean- ingful features for the interaction. A docking between a protein, called a receptor, and a small molecule, called a ligand, usually occurs around depressed regions, called docking sites or pockets, on the surface of a receptor. Since, designing a new drug requires finding a small chemical which can dock or bind at pockets on a protein, the recognition of pockets on proteins is one of the most fundamental processes in the drug design. Considering that chemical databases usually contain millions of chemical data entries, manually identifying pockets on the surface of a protein is time-consuming and error-prone. Therefore, the automatic recognition of pockets and the evaluation of the binding of a chemical to a pocket are rather important in the study of protein-ligand docking for the development of new drugs [37]. While the efforts on the physicochemical approach on this issue have been given since the early days of science, efforts to understand the geometry perspective of biological systems have started only very recently [1,18,27,43,48,53,57]. Since the geometry is also a critical consideration for biological systems in various important aspects, just like any other disciplines, research on the geometry in biological systems will provide new challenges as well opportunities for the community of CAD and CAGD. In this paper, we will present the definition of a docking site, also referred to as a pocket, on the surface of a protein in the geometric point of view and present an effective and efficient algorithm to automatically recognize pockets. Most proteins consist of at most six different types of atoms: H, C, N, O, P, and S which have the corresponding van der Waals radii of 1.2, Computer-Aided Design 38 (2006) 431–443 www.elsevier.com/locate/cad 0010-4485//$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.cad.2005.11.008 * Corresponding author. Address: Department of Industrial Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Korea. Tel.: C82 2 2220 0472; fax: C82 2 2292 0472. E-mail addresses: [email protected] (D.-S. Kim), murick@voronoi. hanyang.ac.kr (C.-H. Cho), [email protected] (D. Kim), ycho@ voronoi.hanyang.ac.kr (Y. Cho).

Transcript of Recognition of docking sites on a protein using β-shape based on Voronoi diagram of atoms

Recognition of docking sites on a protein using b-shape based

on Voronoi diagram of atoms

Deok-Soo Kim a,b,*, Cheol-Hyung Cho b, Donguk Kim b, Youngsong Cho b

a Department of Industrial Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Koreab Voronoi Diagram Research Center, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Korea

Received 14 June 2005; accepted 22 November 2005

Abstract

A protein consists of atoms. Given a protein, the automatic recognition of depressed regions on the surface of the protein, often called docking

sites or pockets, is important for the analysis of interaction between a protein and a ligand and facilitates fast development of new drugs.

Presented in this paper is a geometric approach for the detection of docking sites using b-shape which is based on the Voronoi diagram for

atoms in Euclidean distance metric. We first propose a geometric construct called a b-shape which represents the proximity among atoms on the

surface of a protein. Then, using the b-shape, which takes the size differences among different atoms into account, we present an algorithm to

extract the pockets for the possible docking site on the surface of a protein.

q 2005 Elsevier Ltd. All rights reserved.

Keywords: Pocket; Binding sites; Docking; Voronoi diagram of spheres; b-shape; Protein interaction; Drug design

1. Introduction

Molecules such as proteins, DNA, and RNA consist of

atoms. Given the atomic complexes of these molecules,

analyzing interactions between them is important for under-

standing their biological functions. The interaction between a

protein and a small molecule is also one of the most important

issues in designing new drugs.

The study of molecular interactions, such as the docking of a

protein with a ligand or protein folding, can be approached

from a physicochemical and/or a geometrical point of view

[52]. While the physicochemical approach is to find regions on

the surface of a protein which minimize the potential energy

between two molecules, the geometric approach is to

determine whether two molecules have geometrically mean-

ingful features for the interaction.

A docking between a protein, called a receptor, and a small

molecule, called a ligand, usually occurs around depressed

0010-4485//$ - see front matter q 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.cad.2005.11.008

* Corresponding author. Address: Department of Industrial Engineering,

Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791,

South Korea. Tel.: C82 2 2220 0472; fax: C82 2 2292 0472.

E-mail addresses: [email protected] (D.-S. Kim), murick@voronoi.

hanyang.ac.kr (C.-H. Cho), [email protected] (D. Kim), ycho@

voronoi.hanyang.ac.kr (Y. Cho).

regions, called docking sites or pockets, on the surface of a

receptor. Since, designing a new drug requires finding a small

chemical which can dock or bind at pockets on a protein, the

recognition of pockets on proteins is one of the most

fundamental processes in the drug design. Considering that

chemical databases usually contain millions of chemical data

entries, manually identifying pockets on the surface of a

protein is time-consuming and error-prone. Therefore, the

automatic recognition of pockets and the evaluation of

the binding of a chemical to a pocket are rather important in

the study of protein-ligand docking for the development of new

drugs [37].

While the efforts on the physicochemical approach on this

issue have been given since the early days of science, efforts to

understand the geometry perspective of biological systems

have started only very recently [1,18,27,43,48,53,57]. Since the

geometry is also a critical consideration for biological systems

in various important aspects, just like any other disciplines,

research on the geometry in biological systems will provide

new challenges as well opportunities for the community of

CAD and CAGD.

In this paper, we will present the definition of a docking site,

also referred to as a pocket, on the surface of a protein in the

geometric point of view and present an effective and efficient

algorithm to automatically recognize pockets. Most proteins

consist of at most six different types of atoms: H, C, N, O, P,

and S which have the corresponding van der Waals radii of 1.2,

Computer-Aided Design 38 (2006) 431–443

www.elsevier.com/locate/cad

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443432

1.7, 1.55, 1.52, 1.8, and 1.8 A, respectively [63]. These atoms

with van der Waals radii are usually called van der Waals

atoms. The number of atoms for a protein varies from hundreds

to hundreds of thousands.

Given a protein, the proposed algorithm first computes a

Voronoi diagram of van der Waals atoms. Then, a construct

called a b-shape is computed from the Voronoi diagram using a

spherical probe. The Voronoi diagram of atoms presented in

this paper is similar to the ordinary Voronoi diagram for points

in the sense that the Euclidean distance metric is used.

However, it differs from the ordinary Voronoi diagram since

the distance is measured from the surface of atoms, not from

the centers of atoms.

The first step in defining a docking site is to define the

spatial proximity among the atoms on the surface of the

protein. This is done by using a mesh-like construct called

b-shape, which is similar in some ways to the well-known

a-shape [19]. Then, pocket primitives are defined on the

b-shape where a pocket primitive is a unit of depressed region

on the b-shape. Lastly, the validity of boundaries between

neighboring pocket primitives are evaluated to test if two

neighbors should be considered as being from a single pocket

or not. Eventually, there will be a few pockets left on the

surface of a receptor where each pocket corresponds to an

appropriately depressed region.

We want to emphasize here that a b-shape takes the size

variation of atoms into account for the computation of

proximity among the atoms on the surface of a protein. Recall

that the radius difference between atoms, H and P for example,

is quite significant.

In Section 2 of this paper, we review the previous work

related to the automatic recognition of docking sites on

proteins. After introducing the geometric model of a protein as

an atomic complex in Section 3, we discuss the representation

of topology for a whole protein in Section 4. Section 5 presents

the issues related to the topology among the atoms on the

surface of proteins and provides a definition of b-shape.

Section 6 discusses how to extract pocket primitives from a

b-shape, and Section 7 shows the automatic recognition of

pockets via the merging of pocket primitives. Section 8

concludes the paper by showing some experimental results.

2. Literature review

Since, it is usually agreed that the functions of a protein are

more determined by its geometric structure, the study of

geometric characteristics of proteins has been recently getting

more attention. Besides, the matured technologies in geometry,

such as CAD and computational geometry, are and will be

providing a strong driving force for such a trend.

The first formal treatment of geometry for a biological

atomic complex that we are aware of is the study of Bernal and

Finney in 1967. They examined the packing characteristics of

the complex [7]. Lee and Richards, in 1971, presented the

definition of solvent accessible surface which provided a

theoretical foundation for analyzing the mass properties of

protein [43]. In 1974, Richards defined a molecular surface

using the concept of a Voronoi diagram of atom centers, which

became the basis for most of structure analysis for a protein

including the extraction of pockets [54]. Connolly later

reported how to compute the molecular surface analytically

and beautifully visualized the rendered molecular surface

[13,14]. Thereafter, the molecular surface has also been

referred to as a Connolly surface.

Compared to the above studies, research on algorithmically

extracting cavities and/or pockets on protein began much later.

Geometric approaches to extracting pockets on a protein can be

broadly categorized into three types: a grid-based approach, a

sphere-coating approach, and an approach based on some

representation of surface atoms on the protein.

Being both conceptually and computationally easier than

the other two approaches, the grid-based approach was the first

method of choice for extracting pockets. The grid-based

approach primarily defines a 3D spatial lattice of the space

occupied by the protein and uses simple techniques to reason

the relative relations among the grid points in the lattice. The

grid points, associated with some attributes, are then used to

extract the exterior boundary of the protein and recognize the

depressed regions on the surface. After making some efforts to

use the concept of filling small spheres around a protein and

separating some meaningful chunks of spheres, the main

stream of this approach proceeds to use the mathematically

rigorous, computationally efficient and robust representation of

the atomic complex. The following is a literature review on this

topic which is summarized in Fig. 1.

The first research that we are aware of in the geometric

approach is by Voorinholt et al., in 1989, employing the grid-

based approach [59]. They created a grid of the bounding box

for a protein where each grid point was associated with a

distance value to the nearest atom. While the focus of this

research was on a fast visualization of protein, the density map

thus obtained was also effective in discriminating the regions of

low density where cavities exist. The distance-metric used in

this work was the squared Euclidean distance to save

computation time of square-root operation. In addition, the

concept of a digital differential analyzer was used for speeding

up in the distance computation for each boxel.

In 1990, Ho and Marshall proposed another grid-based

algorithm consisting of two steps [28]: first, they created a

bounding box of a protein to define a uniform grid, and then

sliced the bounding box. Then, after filling the cavity of the

protein with filler atoms using a flood-filling algorithm, they

isolated cavities using a boolean complement operation.

In 1991, Alard and Wodak reported on an elegant approach

to detect internal voids of protein using the concept of topology

[3]. Suppose that the intersection among all atoms is computed,

then the surfaces of all atoms are subdivided into a set of

spherical polygons. Then, some of the polygons are interior to

atoms and the others are not. Note that those not-interior-to-

atoms can be easily separated from the others. Reinventing the

idea of the B-rep and the related concepts in the geometric

modeling [39] such as orientations, topology operations, etc.

they separately constructed an outer shell and inner shells, if

they exist.

Fig. 1. Summary of research on detecting docking sites on a protein.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 433

A year later in 1992, Levitt and Banaszak reported on a

rather simple yet then practical algorithm to detect both

internal and external cavities [44]. They defined a fine grain

grid in the bounding box of protein and isolated grid cells

intersecting the atoms. By scanning the whole grid for each Y

and Z value of the grid system in the CX direction from the

smallest X value, they isolated cells contained between cells

which intersect the atoms. These cells together define a cavity.

To visualize the surface of cavities, they used marching cubes

for the approximation.

Kleywegt and Jones presented an algorithm to detect

internal voids and invagination, a depression on protein

whose mouth is relatively narrow [36]. Suppose that atoms in

a protein are fattened, or offset, by a specified scale. Then,

these features are isolated from outside and therefore can be

recognized by collecting the grid points lying interior to the

fattened protein by applying the technique studied previously.

In 1995, Laskowski presented an algorithm consisting of

two steps [38]. He first computed some representative tangent

spheres from surface atoms of a protein. Hence, many of the

tangent spheres intersect each other. Then, by collecting the

tangent spheres intersecting each other in an appropriate

density, he constructed the boundary of a cavity.

After computing exterior spherical polygons as Alrad and

Wodak [3], Seidl and Kriegel, in 1995, presented a topology

data structure, similar to the well-known winged-edge data

structure, among the spherical polygons [55]. They also

classified the spherical polygons into three categories: convex,

concave, and saddle patches. Using the idea of region growing,

they segmented the molecular surface while the neighboring

patches were approximated by a paraboloid which was

considered a cavity.

Recently, researchers have started to use more rigorously

defined mathematical and computational tools related to the

geometry among the atoms in a protein. An a-shape, reported

in 1994, is one of the most powerful tool. Since the a-shape can

construct the surface of protein quite efficiently for a fixed size

probe, it has been often used in the extraction of pockets [19].

In 1996, Peters et al. published a noble algorithm using an

a-shape to construct the topology among atoms on the surface

of protein [53]. They defined two constructions of surface

forms: global and detailed forms corresponding to larger and

smaller values for a, respectively. By investigating the

discrepancy between the two forms, they came up with a

truly automatic recognition scheme for the cavities. We want to

mention here that the concept used in our algorithm is similar

to this in that we also use two forms.

Starting from the mathematical definition of a pocket by

Edelsbrunner et al. [18] based on the well-known a-shape,Liang et al. in 1998 presented a decent algorithm and system

for the extraction of pockets from protein [48]. The algorithm

is based on the discrete-flow method which can be explained in

2D as follows. After an a-shape is constructed, each triangle

outside the a-shape is tested to see if it is obtuse or acute. Then,

all obtuse triangles are merged to the neighboring acute

triangle which results in a pocket with a relatively narrow

mouth. Note that the pocket recognized in this approach is

equivalent to the invagination in [36]. The implementation was

later successfully packaged into the popular software CAST.

In 2000, Brady and Stouten improved the sphere coating

approach to repeat coating spheres layer by layer. After a layer

of spheres is coated, some irrelevant spheres are removed from

the coated layer. Then, another layer of spheres is coated [8].

After some iteration, each chunk of spheres deposited around

cavities is identified as a pocket.

It is worth to mention that there have been other approaches

as well. Delaney reported an approach based on a pattern

recognition technique using cellular logic operation from

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443434

image processing where a logic value is assigned to each grid

[17]. In [49], Masuya and Doi described the definition of

pockets using the concept of set operations.

3. Geometric models of protein and related terminologies

In order to analyze the geometric characteristics of proteins,

it is necessary to have an appropriate geometric model for the

proteins. Depending on the application, various models such as

a hard sphere model, a ball-and-stick model, a ribbon model, or

a combination of the above have been used. In this research, we

have adapted the most popular hard-sphere model with van der

Waals atoms, which is sometimes called a CPK-model. A

protein represented by the CPK-model is shown in Fig. 2. The

balls in the figure denote van der Waals atoms constituting a

protein.

In most studies of analyzing geometric characteristics of a

protein with respect to another molecule, which is usually

relatively small, the analysis is usually done using the concept

of a spherical probe which encloses the small molecule. While

a probe is an approximation of the small molecule, the probe

best represents the molecule by incorporating its shape,

conformation changes, and all possible orientations of the

ligand with respect to the protein. Hence, it is considered that

the behavior of a probe best represents the geometric behavior

of the molecule with respect to a protein. In the case of a water

molecule, the corresponding probe is a sphere with the radius

of 1.4 A.

The points on the boundary of van der Waals atoms

constitute a boundary surface of a protein which is

conveniently referred to as the van der Waals surface of the

protein. In addition, there are two more important types of

surfaces associated with a protein: the solvent accessible

surface and the molecular surface. The solvent accessible

surface consists of points on the space where the center of the

probe is located when the probe is in contact with the protein.

The inner-most possible trajectories of points on the probe

SAS

p

CS RS

VWS

Fig. 2. The geometric model of a protein consisting of five atoms. Shown in the

figure are the van der Waals surface (VWS), the solvent accessible surface

(SAS), the contact surface (CS), and the reentrant surface (RS) corresponding

to a probe p.

surface, then, define a molecular surface. A solvent accessible

surface usually defines a free-space that a small molecule can

move around without interfering with the protein and therefore

plays a fundamental role for folding and/or docking [43]. On

the other hand, the molecular surface, often referred to as the

Connolly surface after the name of the researcher who first

analytically computed the surface, conveniently defines the

boundary between the interior and exterior volume of a protein

so that the volume or the density of the protein can be

calculated [13].

A molecular surface consists of two parts: the contact

surface and the reentrant surface. A contact surface consists of

points on the van der Waals surface of atoms which can be

contacted by the probe surface, and a reentrant surface consists

of points in the free-space touched by the probe when the probe

is in contact with nearby atoms in the protein. Note that atoms

contributing to the contact surface define the boundary of the

protein. In this paper, we will refer to such atoms as surface

atoms. Points on the molecular surface are always accessible

by the probe as it rolls over the protein. In the geometric

modelling community, the reentrant surface is called the

blending surface and its computation has been studied quite

extensively in a rather general setting [6,9,26]. The solvent

accessible surface is also known in the geometric modelling

community as the offset surface of a protein using the probe

radius as an offset distance [29]. Note that the definitions of all

of the above-mentioned surfaces depend on a probe.

Fig. 3 shows two molecules, a receptor R and a ligand L,

interacting with each other via a pocket defined on the surface

of molecule R. R and L interact with each other since the

protruding region of L is geometrically inserted into the pocket

on the surface of R.

Let AZ{a1,a2,.,an} be a protein consisting of atoms aiZ(ci, ri) where ciZ(xi, yi, zi) and ri is the center and the radius of

the atom ai, respectively. In addition, suppose that LZ{l1,l2,.,lm} is a ligand which also consists of a number of

atoms lj, defined similarly to ai, and L will be docking with A.

Let CZ{c1,c2,.,cn} be the set of centers of atoms. Note that in

general m/n. Let pZ(cp, rp), called a probe, be the minimum

sphere enclosing all atoms in the ligand L.

Let pj be a pocket where pjZ{aj1,aj2,.,ajk} and these

atoms together define a depressed region on the surface S of

Fig. 3. A docking configuration between a receptor and a ligand.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 435

protein A. The surface SZ{s1, s2,., sl} is defined as a set of

atoms of the protein where some points on the surface of the

atoms contribute to the molecular surface. Hence, S is a set of

atoms ai2A which may be touched by the ligand. Since

pj4S4A and p1gp2g/gpp4S4A, there may be some

atoms not included in any pocket. Let PZ{p1, p2,., pp} be

the set of all possible pockets on S.

4. Topology representation for a whole protein

To efficiently respond to queries about the spatial structure

of a protein, it is necessary to have a convenient

representation of the spatial structure among atoms constitut-

ing the protein.

In the study of protein structures, the ordinary Voronoi

diagram VD(C) for the set C of center points c’s for atoms, and

its dual structure Delaunay triangulation has been frequently

used since Bernal and Finney first introduced it in 1967 [7,54].

VD(C) forms a tessellation of space where each region in the

tessellation consists of locations in the space closer to a

corresponding input point. Since VD(C) is mathematically

well-defined and efficient as well as robust codes are available,

it has been widely used by most previous studies

[4,16,20,23,24,50,56,58,62].

By recognizing the fact that a VD(C) does not account for

the size differences among different atoms, Richards also

proposed a scheme, in 1974, to translate bisector edges to the

smaller atoms according to the ratio between the radii of two

neighboring atoms [54]. However, this transformation does not

necessarily produce a valid tessellation because the vertices

were not well-defined. Richards used the term a vertex error to

describe this situation.

Noting the vertex error, Gellatly and Finney proposed, in

1982, a method using radical planes instead of the translated

Voronoi edges to make sure that no vertex error occurs [22].

This radical plane approach is in fact equivalent to the power

diagram PD(A) for an atom set A, as named by Aurenhammer

in 1987 [5]. Since then, the power diagram has been frequently

used in biology problems since it reflects the size differences

among atoms at a certain level [6,21,22]. Note that the theory

of PD(A) is also well-established and efficient and robust codes

are available [65]. However, PD(A) does not fully reflect the

size difference in the sense that the distance from a location in

space to an atom is the tangential distance rather than the

Euclidean minimum distance.

Hence, in our research, we propose to use the Voronoi

diagram of atoms where the distance is defined as the

Euclidean minimum distance, instead of the tangential

distance, from the surfaces of atoms. While the ordinary

Voronoi diagram of points and the power diagram have been

studied quite extensively and efficient computational codes are

available, its counterpart for the Voronoi diagram of spheres

has not been studied as much. In many applications for

proteins, the ordinary and power metric Voronoi diagram are

the approximations of what is actually needed. It is only very

recently that the fast construction of Voronoi diagram for

circles and spheres with different radii became practical

[30–34]. Once the Voronoi diagram for spheres became

available, many studies in geometrical perspective of a protein

could be done quite efficiently. The constructed Voronoi

diagram is then stored in a radial data structure for the efficient

processing of various queries [10].

A Voronoi diagram VD(A) for an atom set A is defined as

follows. Associated with each atom aiZ(ci, ri)2Awhere ci and

ri are the center and radius of ai, there is a corresponding

Voronoi region

VRi Z fpjdistðp; ciÞKri!distðp; cjÞKrj; isjg

Note that dist(p, q) denotes an ordinary Euclidean distance,

i.e. distðp; qÞZffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxpKxqÞ

2C ðypKyqÞ2C ðzpKzqÞ

2q

. Then,

VD(A)Z{VR1, VR2,.,VRn} is the Voronoi diagram for the

given atoms and represented as GVZ(VV, EV, FV) where

VVZ fvV1 ; vV2 ;.g, EVZ feV1 ; e

V2 ;.g and FVZ ff V1 ; f

V2 ;.g are

sets of Voronoi vertices, edges and faces, respectively. From

the definition of a Voronoi diagram, a Voronoi vertex vV is the

center of an empty sphere tangent to four nearby atoms, while a

Voronoi edge eV is defined as a locus of points equi-distant

from the surfaces of three surrounding atoms. In addition, a

Voronoi face f V is the surface defined by two neighboring

atoms. Note that the face is always a hyperbolic surface and

any point on the face is equi-distant from the surfaces of both

atoms. For more details, readers are recommended to refer to

[33,34].

It is important to mention the combinatorial complexity of

the Voronoi diagram of spheres. While the numbers of vertices,

edges, and faces of the Voronoi diagram of general spheres are

all O(n2) in the worst-case, the average numbers for those are

all O(n). Halperin found that the upper bound of the

combinatorial complexity for all of the vertices, edges, and

faces of the Voronoi diagram for atoms in a protein is O(n) in

the worst-case [25]. This property is due largely to two

characteristics of atom distributions in a protein. According to

Pauli’s exclusion principle, two atoms cannot be located at the

same position meaning that an atom cannot be contained by

another atom [51]. In addition, the differences in the atom radii

are within a constant since most proteins consist of six different

types of atoms, such as H, C, N, O, P, and S, with the

corresponding van der Waals radii as discussed earlier. Under

these conditions, Halperin showed that the number of

neighboring atoms, which define Voronoi faces, for a given

atom is constant in the worst-case.

5. Topology among surface atoms of protein

Since, pj4S, extracting pockets needs to query on the

surface shape of the protein. Hence, an appropriate definition of

the surface of a protein and the efficient representation of the

topological structure among atoms on the surface is necessary.

5.1. a-shape

In 1994, Edelsbrunner proposed a 3D a-shape from a set of

3D points [19]. The a-shape is defined by carving out the space

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443436

with an omnipresent open sphere with a radius a when

the sphere does not contain any input point. When the sphere

touches two points, they are connected by an edge. When the

sphere touches three points, a triangular face is defined by the

points. When the sphere touches only one point, the point is left

as a singleton. Then, these points, edges and triangular faces

together define the a-shape for the given point set. When aZN, the a-shape of a point set is the boundary of the convex hull

of the set. If aZ0, the a-shape is the point set itself. A robust

algorithm with O(n2) worst-case time complexity was also

given by same authors to construct an a-shape from the

Delaunay triangulation of point set.

Since an a-shape defines the concept of shape without any

ambiguity and since an efficient and a robust code is available,

there have been several studies based on the a-shape in biology

such as automatic recognition of pockets [18,48,53], internal

voids of a protein [47], calculation of the area and volume of

protein [2,45,46]. There has also been research based on the

a-shape in computer graphics as well.

However, an a-shape suffers from the fact that it does not

incorporate the size differences among atoms since an a-shape

is computed from the Delaunay triangulation of atom centers.

Fig. 4, for example, illustrates a possible problematic situation

that can be encountered due to the size differences. Shown in

Fig. 4(a) is a protein with a pocket-like depression on the

surface and the mouth of the depression is located between two

relatively large atoms al and ar. Since the probe p, with a radius

rp, cannot freely enter into the depression without colliding

with al or ar or both, the depression should not be considered as

a pocket. Fig. 4(b) shows the ordinary point set Voronoi

diagram for the centers of atoms and its corresponding

Fig. 4. A false pocket recognized from the a-shape of a protein. (a) A protein

and a probe, (b) the ordinary Voronoi diagram for the atom centers and the

corresponding Delaunay triangulation, (c) the a-shape corresponding to the

inflated probe ~p, and (d) the false pocket recognized from the a-shape.

Delaunay triangulation. Fig. 4(c) shows the corresponding

a-shape computed from the Delaunay triangulation with the

value of a as rp plus the average radius of all atoms in the

protein. As shown in the figure, the probe passes the mouth,

represented as a dotted Delaunay edge between cl and cr, freely

and therefore a false pocket will be concluded as shown in

Fig. 4(d).

5.2. b-shape

Despite its many virtues, an a-shape is unable to account for

the size differences among different types of atoms. Hence, we

have recently proposed a geometric construct called a b-shape

based on the Voronoi diagram of spheres [35]. We first

introduce the concept of b-hulls and then extend it to b-shapes.

Conceptually, a b-hull is a generalization of an a-hull and can

be similarly described. The point set from which an a-hull is

defined is now replaced by a set of three dimensional spherical

balls.

Consider R3 filled with Styrofoam and some spherical

rocks scattered around inside the Styrofoam. The radii of the

spherical rocks vary. Now imagine a spherical eraser with

radius b. Then, carving out the Styrofoam with an

omnipresent and empty spherical eraser with the radius of

b will result in a b-hull. Since the eraser is omnipresent, there

can be interior voids as well. Recall that an a-shape is

obtained by straightening the curved geometry in the

corresponding a-hull. A b-shape can be similarly explained

with a slight, yet fundamental, difference. In the b-family,

therefore, the relationship between a b-hull and the

corresponding b-shape is slightly different from their

counterparts of the a-family.

Suppose that we are given a b-hull for an atomic

structure A. Then, connecting the centers of the appropriate

atoms with edges and triangles when a b-ball at a particular

position in the space touches two or three nearby atoms

simultaneously, respectively, the b-shape for a set A

corresponding to the b-hull can be obtained. The details of

the definition, the properties and the algorithms for b-shape

is presented in [35].

6. Extraction of pocket primitives from b-shape

Let pL and pN be a probe for a ligand L and a hypothetical

probe with infinite radius, respectively. Let BL and BN be the

b-shapes of a protein corresponding to pL and pN, respectively.

Then, BN is a b-shape bounded by faces defined by the centers

of atoms with unbounded Voronoi regions. Unlike an a-shape,

however, a b-shape BN may contain some isolated vertices.

Let BI and BO denote BL and BN to mean the inner and outer

b-shapes of a given model, respectively. Suppose that BIZðVB

I ;EBI ;F

BI Þ and BOZ ðVB

O;EBO;F

BOÞ. Let V

BOZ fvO1 ; v

O2 ;.g. EB

O,

FBO, V

BI , E

BI , and FB

I are similarly defined.

Fig. 5 shows a 2D analogy for the inner and outer b-shapes

for a protein. From this figure, we can make a simple

observation as following: for each edge of BO, there is zero

or one depression on BI of the protein. For example, in

Fig. 6. The example of pocket primitives: (a) two faces(dashed line) of outer

mesh on inner mesh and (b) corresponding pocket primitives.

BO

BI e1

e2

e1

e3

e4

e5

e2

I

I

I

I

I

O

O

(a)

(b)

Fig. 5. Inner and outer b-shape in 2D: (a) the solid line and dotted line are inner

and outer molecular surfaces, respectively and (b) corresponding inner and

outer b-shapes.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 437

Fig. 5(b), an edge eO1 of BO corresponds to a depressed region

formed by edges eI1, eI2, eI3 and eI4. When an edge of BO

corresponds to a depression, the depression can be regarded as

a pocket. Obviously, no pocket is defined when an edge on BO

coincides with one of inner b-shape, shown as eI5 and eO2 in the

figure.

Similar observation can be made for its 3D counterpart. For

a face f O2FBO of a 3D protein, there is a corresponding

depressed region on BI unless fO coincides with a face f I2FB

I .

However, a pocket on BI may or may not correspond to a

face f O2FBO. A large pocket, for example, may correspond to

two faces of BO in FBO when the depressed regions from two

faces of BO does not have a clear boundary between them. In

such a case, a depressed region on BI corresponding to a face

f O2FBO cannot be defined to form a complete pocket. Instead,

both depressed regions may altogether define a single pocket.

Hence, we first introduce the concept of pocket primitive f as a

unit depressed region on BI corresponding to each face

f O2FBO.

A face f Oi 2FBO has three associated vertices vOi1 , v

Oi2, and vOi3

in VBO, and there are always three vertices vIi1 , v

Ii2, and vIi3 in VB

I

which coincide with vOi1 , vOi2, and vOi3 , respectively. Let gði1;i2Þ

be

geodesic, i.e. the shortest path, on the inner b-shape BI between

vIi1 and vIi2. The path from a vertex follows an incident edge and

the distance between two neighboring vertices is defined as the

edge length between the two vertices. Hence, the distance

between two arbitrary vertices is the sum of the edge lengths

along the shortest path connecting two vertices. We call the

geodesic gði1;i2Þa ridge between two pocket primitives.

The geometric meaning of gði1;i2Þis as follows. While the

extreme vertices vIi1 and vIi2 are on both BO and BI, the other

vertices on the path define depressions on BI from the

corresponding face of BO. Hence, the geodesic gði1;i2Þcan be

interpreted as the most upward wall separating two relatively

deep depressions on BI. Other geodesics gði2;i3Þand gði3;i1Þ

can

be similarly interpreted.

Let ~FIi be a set of faces f Ih , where f Ih 2FB

I is interior to the

three geodesics gði1;i2Þ, gði2;i3Þ

, and gði3;i1Þ. Then, ~F

Ii forms a

topologically triangular shaped depression on BI from the

corresponding face of BO. This depression is called a pocket

primitive fi corresponding to a face f Oi 2FBO and is also

represented by another graph fiZ ð ~VIi ; ~E

Ii ; ~F

Ii Þ. Hrence, the

following properties should hold.

Property 1. jFBOj% jFB

I j where jXj is the cardinality of set X.

Property 2. ~FIi 3FB

I :

Algorithm Extraction of pocket primitives

Input: BI, FBO

Output: the set of pocket primitives Q

Step 1. For each f Oi 2FBO.

Step 1.1. Identify three vertices vi1, vi2, and vi3 in VBI corresponding to the vertices of f Oi .

Step 1.2. Find the geodesics g(i1,i2), g(i2,i3), and g(i3,i1) corresponding to vi1, vi2, vi3.

Step 1.3. Find fiZ ð ~VIi ; ~E

Ii ; ~F

Ii Þ surrounded by g(i1,i2), g(i2,i3), and g(i3,i1).

Step 1.4. Add fi to Q for pocket primitives.

End-for

Step 2. Terminate.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443438

In Fig. 6(a), BI is shown as a mesh of solid lines and BO is

shown as two large triangles f OA and f OB bounded by broken

lines. Four geodesics on BI corresponding to the four edges

(shown as broken lines) on BO are shown in Fig. 6(b) as thick

lines. Inside the four paths, two corresponding pocket

primitives fA and fB are shown.

When a face f Oi 2FBO coincides with a face f Ii 2FB

I ; ~FIi

consists of a single face f Ii 2FBI and it is not considered as a

pocket primitive. Note that ~FIi can even be a null set in the case

when three geodesics degenerate to three curve segments

without containing any face inside. In such a case, no pocket

primitive corresponds to the face. From this, we can draw a few

properties.

Property 3. If fi and fj are not topological neighbors, then~FIih ~F

IjZ: where isj.

Property 4. If fi and fj are topological neighbors,~EIih ~E

IjZgðijÞs:. The geodesic g(ij) is called a ridge

between fi and fj.

Next we present an algorithm for extracting pocket

primitives from BO.

Suppose that a b-shape is stored in an appropriate data

structure supporting non-manifold models. In the CAD

community, such data structures have been quite extensively

studied [11,12,39–42,60,61]. In particular, we recommend

readers to refer to [11,39] for a thorough explanation of the

non-manifold data structure.

The loop in Step 1 iterates jFBOj times in the worst-case.

While Step 1.1 takes O(1), the Step 1.2 takes OðjEBI jC

jVBI jlogjV

BI jÞ if the Dijkstra algorithm based on a Fibonacci

heap is used [15]. Step 1.3 requires Oðj ~FIi jÞ for each face of F

BO.

It can be shown that the worst-case time complexity for the

whole algorithm is bound by either OðjEBI jC jVB

I jlogjVBI jÞ

when OjFBOjZOð1Þ or ðjFB

I jÞ when jFBOjZOðjFB

I jÞ.

7. Merging pocket primitives to form pockets

Given pocket primitives, we consider that one or more

neighboring pocket primitives may form a pocket. Hence,

we check if two neighboring pocket primitives can be

merged together to form a more meaningful depression

based on an appropriate criterion. Recall that a ridge g(i,j)

exists between two incident pocket primitives fi and fj. It

is also an edge chain on BI corresponding to the geodesic

between two extreme vertices of a pocket primitive. Hence,

a ridge plays the role of boundary between two incident

pocket primitives.

Let a mountain be the edge chain on BI separating two

pockets. If a ridge is sufficiently high, it can be regarded as a

mountain. Note that a pocket primitive has always 3 ridges and

a pocket is surrounded by three or more mountains. Therefore,

the boundary of a pocket primitive may or may not be the

boundary of a pocket.

Suppose that a path gk, which is in deed a ridge,

exists between two incident pocket primitives fi and fj

corresponding to an edge eOk of EBO. Note that there

always exists a geodesic on BI for an edge of BO.

Then, we can define a certain measure to determine the

discrepancy between two chains, eOk and gk. Depending on

the measure and its prescribed threshold value, two pocket

primitives sharing the chain can be considered from one

larger pocket.

Even though there are various ways to define such a

measure for the merge, we use the concept of average

distance between two chains. Let dk be the average distance

between eOk and gk. If dk is larger than a prescribed value, we

merge two neighboring pocket primitives sharing the chains.

Otherwise, we regard gk as a mountain chain. As such a

threshold value, in this paper, we have chosen the average of

all d values. After all, atoms for vertices in merged pocket

primitives define the pocket pk. Note that there may be

various other measures that can be used for the merge of

pocket primitives and these measures can be easily computed

once they are well-defined. For example, internal angles at

the edges of a ridge, the intrinsic shape of a pocket primitive,

etc. are such examples.

Once pockets are recognized, it is often necessary to

evaluate the significance of the pockets. In other words, some

recognized pockets might not be regarded as significant

pockets. There are different criteria for the measure for such

evaluation. For example, the average or maximum depth of

pocket from the entrance of the pocket, the volume of the

Fig. 7. Atomic structure of the protein 1BH8 [67] downloaded from PDB database: (a) chain A (darker atoms) and B (lighter atoms) and (b) atoms in the chain A.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 439

pocket from the entrance, etc. Note that these can be also easily

computed.

We want to mention that BO can be defined by a probe p

with a radius rp/N. Then, the resulting pockets from such a

BO is in a finer grain than those from BN. Often, the pockets

extracted from such a finer grain BO can be more meaningful.

We also want to point out that the proposed b-shape can be

easily used to accommodate the concept proposed by Liang

et al. [48] more effectively.

Fig. 8. Group A of the protein 1BH8 and its use for the pocket extraction: (a) the conv

the boundary of pocket after merges, and (f) the largest pocket on the molecular su

8. Experiments and discussions

Shown in Fig. 7(a) is a dimer, a protein consisting of two

separate groups of atoms, Transcription regulation complex,

downloaded from PDB [66,67] with the entry code 1bh8. The

darker and lighter atoms in Fig. 7(a) denote groups A and B,

respectively. Fig. 7(b) illustrates the atoms in-group A only.

From the figure, it can be easily seen that group B binds with

group A in a large depressed region of group A.

ex hull, (b) the molecular surface, (c) the b-shape, (d) the pocket primitives, (e)

rface.

Fig. 9. b-shape and the largest pocket (blue color surface) on molecular surface for 1BH8 A group according to probe size of outer b-shape: (a) 60 A, (b) 40 A, (c)

30 A, and (d) 20 A. Probe radius of all inner b-shapes is 8 A.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443440

Fig. 10. Example of pocket for a protein used in CAPRI: (a) A and C chains of Lactobacillus HPr kinase, (b) Bacillus subtilis HPr binds to pocket of Lactobacillus

HPr, (c) the pocket on b-shape, and (d) the pocket on molecular surface.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 441

Fig. 8 visualizes the process of the pocket recognition from

the model in Fig. 7(b). After computing the Euclidean

Voronoi diagram of the protein, BO of the protein is computed

as shown in Fig. 8(a). Shown in Fig. 8(b) is the molecular

surface of the protein after blending using a predefined probe,

and Fig. 8(c) shows the b-shape. Then, the pocket primitive

corresponding to each face of BO is shown in Fig. 8(d). In this

example, an outer b-shape BOZBN and an inner b-shape BI is

obtained by blending the protein with a probe of radius 8 A. In

this figure, the yellow faces on BI denote faces coinciding

with faces on BO and therefore the yellow faces do not

contribute to any pocket primitive.

After pocket primitives are properly extracted, we evaluate

the ridges around all pocket primitives and merge the

appropriate pocket primitive pairs if necessary to form

pockets. Fig. 8(e) and (f) show the b-shape BI and the

corresponding blended molecular surface of the largest pocket

recognized on the protein.

While Fig. 8 shows the process of pocket recognition where

BOZBN, Fig. 9 illustrates the same process for four different

BO’s with different sizes of probes. Fig. 9(a) is the case when BO

is produced from a probe with a radius of 60 A rather than N.

The first and second columns show the b-shape and molecular

surface in the same orientation. The third and fourth columns

show the same models from a different view. The pocket shown

in the blue color is the largest one among the recognized pockets.

Fig. 9(b)–(d) are the cases where the radii of the probe forBO are

40, 30, and 20 A’s, respectively. When the radius of the probe is

50 A, the pocket is identical to the case of 40 A.

From these figures, one can easily observe that a different

BO, while the other conditions remain identical, produce a

different set of recognized pockets. For example, compare the

pockets in Fig. 9 with one in Fig. 8(f).

From this experiment, it can be observed that the number

of pockets increases as the radius of probe decreases. In

addition, there is a very strong tendency that increasing the

probe radius causes an increase of the area of a pocket on the

molecular surface. Fig. 9(c) and (d) show that even small

changes in probe size can cause a drastic change in pocket

area. In Fig. 9(c) and (d), the largest pockets are even located

at completely different places on the protein.

Fig. 10(a) shows a trimer, a receptor protein consisting of

three disconnected groups of atoms, called Lactobacillus HPr

kinase. This protein was downloaded from CAPRI (Critical

Assessment of PRediction of Interactions) [64]. The dark

portion in Fig. 10(b) is B. subtilis HPr which plays the role of

a ligand to bind with the receptor. Fig. 10(c) and (d) illustrate

the pocket recognized by the proposed algorithm.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443442

9. Conclusions

The recognition for docking sites, called pockets, on the

molecular surface of a protein is one of the most important

starting points for structure-based rational drug design. In this

paper, we have provided the definition of pockets on a protein

from a geometric point of view. We have also presented an

algorithm to automatically recognize pockets on the surface of

proteins.

In the proposed algorithm, we first compute a Euclidean

Voronoi diagram of atoms and construct b-shapes correspond-

ing to given probes from the Voronoi diagram. We compute

two b-shapes: one for inner and the other for outer definitions

of surface atom sets. Then, we compute pocket primitives on

the inner b-shape corresponding to each face of the outer

b-shape. After extracting pocket primitives, we evaluate the

quality of boundaries between neighboring pocket primitives to

test if two neighbors should be merged into a single pocket or

not. Eventually, a few pockets remain on the surface of a

receptor where each pocket corresponds to an appropriately

depressed region. The algorithm in this paper has been fully

implemented using Microsoft CCC on Windows XP and has

been tested on various protein models.

Opening a new research area for the CAD community, this

research creates more challenges than solutions in the process

of the rational drug design. For example, a better definition of a

pocket, more appropriate criteria for the merge of pocket

primitives using other meaningful measures such as the

volume, depth, or the morphological shape of pocket

primitives, and so on, are left for the future research.

Acknowledgements

This research was supported by Creative Research

Initiatives from the Ministry of Science and Technology,

Korea. Authors thank Dr Jong Bhak for the helpful discussions.

References

[1] Agarwal PK, Edelsbrunner H, Harer NJ, Wang NY. Extreme elevation on

a 2-manifold. Proceedings of the twentieth annual symposium on

Computational geometry. New York, USA: Brooklyn; 2004. p. 357–65.

[2] Akkiraju N, Edelsbrunner H, Fu P, Qian J. Viewing geometric protein

structures from inside a CAVE. IEEE Comput Graph Appl 1996;16:

58–61.

[3] Alrad P, Wodak SJ. Detection of cavities in a set of interpenetrating

spheres. J Comput Chem 1991;12(8):918–22.

[4] Angelov B, Sadoc J-F, Jullien R, Soyer A, Mornon J-P, Chomilier J.

Nonatomic solvent-driven Voronoi tessellation of proteins: an open tool

to analyze protein folds. Proteins Struct Funct Genet 2002;49:446–56.

[5] Aurenhammer F. Power diagrams: properties, algorithms and appli-

cations. SIAM J Comput 1987;16:78–96.

[6] Bajaj CL, Pascucci V, Shamir A, Holt RJ, Netravali AN. Dynamic

maintenance and visualization of molecular surfaces. Discrete Appl Math

2003;127:23–51.

[7] BernalJD,FinneyJL.Randomclose-packedhard-spheremodel II.Geometry

of random packing of hard spheres. Discuss Faraday Soc 1967;43:62–9.

[8] Brady Jr GP, Stouten PFW. Fast prediction and visualization of protein

binding pockets with PASS. J Comput Aided Mol Des 2000;14:383–401.

[9] Chen C, Chen F, Feng Y. Blending quadric surfaces with piecewise

algebraic surfaces. Graph Mod 2001;63:212–27.

[10] Cho Y, Kim D, Kim D-S. Topology representation for euclidean voronoi

diagram of spheres in 3D. Digital engineering workshop and fifth Japan–

Korea CAD/CAM workshop. Tokyo, Japan: RCAST, University of

Tokyo; 2005. p. 121–6.

[11] Choi Y. Vertex-based boundary representation of non-manifold geo-

metric models. PhD Thesis, Carnegie-Mellon University, USA; 1989.

[12] ChoiG-H,HanS-H, LeeH-C.Optional storage of non-manifold information

for solid models. Trans Soc CAD CAM Eng 1997;2(3):150–60.

[13] Connolly ML. Solvent-accessible surfaces of proteins and nucleic acids.

Science 1983;221:709–13.

[14] Connolly ML. Analytical molecular surface calculation. J Appl Crystal-

logr 1983;16:548–58.

[15] Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to

algorithms. 2nd ed. Cambridge: MIT press; 2001.

[16] David C. Voronoi polyhedra as structure probes in large molecular

systems. Biopolymers 1988;27:339–44.

[17] Delaney JS. Finding and filling protein cavities using cellular logic

operations. J Mol Graph 1992;10:174–7.

[18] EdelsbrunnerH, FacelloM,Liang J.On the definition and the construction of

pockets in macromolecules. Discrete Appl Math 1998;88:83–102.

[19] Edelsbrunner H, Mucke EP. Three-dimensional alpha shapes. ACM Trans

Graph 1994;13(1):43–72.

[20] Finney J. Volume occupation, environment and accessibility in proteins.

The problem of the protein surface. J Mol Biol 1975;96:721–32.

[21] Fischer W, Koch E. Geometrical packing analysis of molecular

compounds. Z fur Kristallographie 1979;150:245–60.

[22] Gellatly BJ, Finney JL. Calculation of protein volumes: an alternative to

the voronoi procedure. J Mol Biol 1982;161(2):305–22.

[23] Gerstein M, Tsai J, Levitt M. The volume of atoms on the protein surface:

calculated from simulation, using Voronoi polyhedra. J Mol Biol 1995;

249:955–66.

[24] GoedeA,PreissnerR,FrommelC.Voronoicell: newmethodfor allocationof

space among atoms: elimination of avoidable errors in calculation of atomic

volume and density. J Comput Chem 1997;18:1113–23.

[25] HalperinD,OvermarsMH.Spheres,molecules, and hidden surface removal.

Proceedings of 10th ACM symposium on computational geometry; 1994. p.

113–22.

[26] Hartmann E. Parametric Gn blending of curves and surfaces. Vis Comput

2001;17:1–13.

[27] Heifets A, Eisenstein M. Effect of local shape modifications of molecular

surfaces on rigid-body protein–protein docking. Protein Eng 2003;16(3):

179–85.

[28] Ho CMW, Marshall GR. Cavity search: an algorithm for the isolation and

display of cavity-like binding regions. J Comput Aided Mol Des 1990;4:

337–54.

[29] Kim D-S. Polygon offsetting using a voronoi diagram and two stacks.

Comput Aided Des 1998;30(14):1069–76.

[30] Kim D-S, Kim D, Sugihara K. Voronoi diagram of a circle set from

Voronoi diagram of a point set: I. Topology. Comput Aided Geom Des

2001;18:541–62.

[31] KimD-S, Kim D, Sugihara K. Voronoi diagram of a circle set from voronoi

diagram of a point set: II. Geometry. Comput Aided Geom Des 2001;18:

563–85.

[32] Kim D-S, Cho Y, Kim D, Kim S, Bhak J. Euclidean voronoi diagram of

3D spheres and applications to protein structure analysis International

symposium on voronoi diagrams in science and engineering. Tokyo,

Japan: University of Tokyo; 2004 p. 13–5.

[33] Kim D-S, Cho Y, Kim D. Edge-tracing algorithm for Euclidean Voronoi

diagram of 3D spheres. Proceedings of 16th Canadian conference on

computational geometry; 2004. p. 176–9.

[34] Kim D-S, Cho Y, Kim D. Euclidean voronoi diagram of 3D balls and its

computation via tracing edges. Comput Aided Des, 2005; 13:1412–24.

[35] Kim D-S, Cho C-H, Ryu J-H, Kim D. Three dimensional beta shapes.

Comput Aided Des, submitted.

D.-S. Kim et al. / Computer-Aided Design 38 (2006) 431–443 443

[36] Kleywegt GJ, Jones TA.Detection, delineation, measurement and display of

cavities in macromolecular structures. Acta Crystallogr Sect D 1994;D50:

178–85.

[37] Kunts ID. Structure-based strategies for drug design and discovery.

Science 1992;257:1078–82.

[38] Laskowski RA. SURFNET: a program for visualizing molecular surfaces,

cavities, and intermolecular interactions. J Mol Graph 1995;13:323–30.

[39] Lee K. Principles of CAD/CAM/CAE systems. Reading, MA: Addison

Wesley; 1999.

[40] Lee SH, Lee K. Compact boundary representation and generalized euler

operators for non-manifold geometric modeling. Trans Soc CAD CAM

Eng 1996;1(1):1–19.

[41] Lee SH, Lee K. Partial entity structure: a compact boundary

representation for non-manifold geometric modeling. ASME J Comput

Inf Sci Eng 2001;1(4):356–65.

[42] Lee SH, Lee K. Partial entity structure: a compact non-manifold boundary

representation based on partial topological entities. Proceedings of the

sixth ACM symposium on solid modeling and applications June 6–8.

Michigan, USA: Sheraton Inn, Ann Arbor; 2001.

[43] Lee B, Richards FM. The interpretation of protein structures: estimation

of static accessibility. J Mol Biol 1971;55:379–400.

[44] Levitt DG, Banaszak LJ. POCKET: a computer graphics method for

identifying and displaying protein cavities and their surrounding amino

acids. J Mol Graph 1992;10:229–34.

[45] Liang J, Dill KA. Are proteins well-packed? Biophys J 2001;81:751–66.

[46] Liang J, Edelsbrunner H, Fu P, Sudharkar PV, Subramaniam S. Analytic

shape computation of macromolecules I: molecular area and volume

through alpha shape. Proteins Struct Funct Genet 1998;33:1–17.

[47] Liang J, Edelsbrunner H, Fu P, Sudharkar PV, Subramaniam S. Analytic

shape computation of macromolecules II: inaccessible cavities in

proteins. Proteins Struct Funct Genet 1998;33:18–29.

[48] Liang J, Edelsbrunner H, Woodward C. Anatomy of protein pockets and

cavities:measurement of binding site geometry and implications for

ligand design. Protein Sci 1998;7:1884–97.

[49] Masuya M, Doi J. Detection and geometric modeling of molecular

surfaces and cavities using digital mathematical morphological oper-

ations. J Mol Graph 1995;13:331–6.

[50] Montoro JCG, Abascal JLF. The voronoi polyhedra as tools for structure

determination in simple disordered systems. J Phys Chem 1993;97(16):

4211–5.

[51] Noggle JH. Physical chemistry. 3rd ed.: Freedom Academy; 1996.

[52] Parsons D, Canny J. Geometric problems in molecular biology and

robotics. Second international conference on intelligent systems for

molecular biology, Palo Alto, CA; 1994. p. 322–30.

[53] Peters KP, Fauck J, Frommel C. The automatic search for ligand binding

sites in protein of known three-dimensional strucutre using only

geometric criteria. J Mol Biol 1996;256:201–13.

[54] Richards FM. The interpretation of protein structures: total volume, group

volume distributions and packing density. J Mol Biol 1974;82:1–14.

[55] Seidl T, Kriegel H-P. Solvent accessible surface representation in a

database system for protein docking. Third international conference on

intelligent systems for molecular biology, vol. 3, Cambridge, UK; 1995. p.

350–8.

[56] Shih J-P, Sheu S-Y, Mou C-Y. A voronoi polyhedra analysis of structures

of liquid water. J Chem Phys 1994;100:2202–12.

[57] Shoichet BK, Kunts ID. Protein docking and complementarity. J Mol Biol

1991;221:327–46.

[58] Voloshin VP, Beaufils S, Medvedev NN. Void space analysis of the

structure of liquids. J Mol Liq 2002;96–97:101–12.

[59] Voorintholt R, Kosters MT, Vegter G, Vriend G, Hol WGJ. A very fast

program for visulaizing protein surfaces, channels and cavities. J Mol

Graph 1989;7:243–5.

[60] Weiler K. The radial edge structure: a topological representation for non-

manifold geometric boundary modeling. In: Wonzy MJ,

McLaughlin HW, Encarnacao JL, editors. Geometric modeling for

CAD applications. New York: North Holland/Elsevier; 1988. p. 3–36.

[61] Weiler K. Boundary graph operators for nonmanifold geometric modeling

topology representations. In: Wonzy MJ, McLaughlin HW,

Encarnacao JL, editors. Geometric modeling for CAD applications.

New York: North Holland/Elsevier; 1988. p. 37–66.

[62] Zimmer R, Wohler M, Thiele R. New scoring schemes for protein fold

recognition based on voronoi contacts. Bioinformatics 1998;14:295–308.

[63] Cambridge crystallographic data centre, 2005; http://www.ccdc.cam.ac.uk/;

2005.

[64] Critical assessment of PRediction of interactions(CAPRI), 2005; home-

page. http://capri.ebi.ac.uk/.

[65] Computational geometry algorithms library (CGAL), 2005; homepage.

http://www.cgal.org/.

[66] PDB Sum. 2005; http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/.

[67] RCSB protein data bank. 2005; http://www.rcsb.org/pdb/.

Deok-Soo Kim Deok-Soo Kim is a professor in

Department of Industrial Engineering, Hanyang

University, Korea. Before he joined the university

in 1995, he worked at Applicon, USA, and

Samsung Advanced Institute of Technology,

Korea. He received a B.S. from Hanyang Univer-

sity, Korea, an M.S. from the New Jersey Institute

of Technology, USA, and a Ph.D. from the

University of Michigan, USA, in 1982, 1985 and

1990, respectively. His current research interests

mainly lie in the theory and applications of

Voronoi diagram while he has been interested in various geometric problems.

He is current the director of Voronoi Diagram Research Center supported by

the Ministry of Science and Technology, Korea.

Cheol-Hyung Cho Cheol-Hyung Cho is a senior

researcher in Voronoi Diagram Research Center at

Hanyang University, Seoul, Korea. He received his

B.S., M.S. and Ph.D. degrees from Hanyang

University in 1996, 1998 and 2005, respectively.

His main research interests lie in the area of

computer graphics, geometric algorithms and their

applications in the molecular biology.

DongukKimDongukKim is a senior researcher in

Voronoi Diagram Research Center at Hanyang

University, Seoul, Korea. He received his B.S.,

M.S. and Ph.D. degrees from Hanyang University

in 1999, 2001 and 2004, respectively. His research

interests include computational geometry, geo-

metric modeling and their applications in the

molecular biology.

Youngsong Cho Youngsong Cho is a senior

researcher in Voronoi Diagram Research Center at

Hanyang University, Seoul, Korea. He received his

B.S., M.S. and Ph.D. degrees from Hanyang

University in 1995, 1997 and 2003, respectively.

His research interests include computational geo-

metry, geometric modeling and their applications in

the molecular biology.