Basic Terminology of Genetics for Archaeologists

14
1 Basic Terminology of Genetics for Archaeologists Shlomo Guil A concise compilation of terms in Vade Mecum format AMINO ACIDS :Biologically important organic compounds composed of amine (-NH2) and carboxylic acid (- COOH) The key elements of an amino acid are carbon ,hydrogen, oxygen, and nitrogen, though other elements are found in the side-chains of certain amino acids. In the form of proteins, amino acids comprise the second- largest component (water is the largest) of human muscles, cells and other tissues. PROTEINS : Molecules made up of amino acids that are needed for the body to function properly. Proteins perform a vast array of functions within living organisms, including catalysing metabolic reactions, and replicating DNA. DNA replication is the process of producing two identical replicas from one original DNA molecule. GENES : Molecular unit of heredity of a living organism. It is used extensively as a name given to some segments of deoxyribonucleic acids (DNA) and ribonucleic acids (RNA). Living beings depend on genes, as they specify all proteins and functional RNA chains. Genes hold the information to build and maintain an organism's cells and pass genetic traits to offspring. All organisms have genes corresponding to various biological traits. The word gene is derived from the Greek word genesis meaning "birth", or genos meaning "origin". There are an estimated 20,000-25,000 human protein-coding genes. DNA : DeoxyriboNucleic Acid or DNA is a molecule that encodes the genetic instructions used in the development and functioning of all known living organisms and many viruses. DNA is a nucleic acid. Nucleic acids are polymolecules, or large biomolecules, essential for all known forms of life. Nucleic acids, which include DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), are made from subunits known as nucleotides. Most DNA molecules consist of two biopolymer strands coiled around each other to form a double helix (spiral). The two strands of DNA run in opposite directions to each other and are therefore anti-parallel. DNA is well-suited for biological information storage. Nucleobases ,or simply known as bases, are the important building blocks of DNA and RNA ,having the ability to form base-pairs and to stack upon one another ,leading directly to the helical structure. The Nucleobases of DNA are C= cytosine ,G= guanine , A= adenine ,T= thymine, abbreviated as C, G, A, T. In normal spiral DNA the bases form Base Pairs between the two strands: A binds only with T and C binds only with G. Therefore, if we know the sequence of nucleobases on one strand, we can reconstruct the sequence of the other strand. The nucleotides consist of the nucleobases + a five-carbon sugar called deoxyribose + phosphate groups. Within cells, DNA is organized into long structures called chromosomes. During cell division these chromosomes are duplicated in the process of DNA replication, providing each cell its own complete set of chromosomes. RNA : RiboNucleic Acid (RNA) is a polymeric molecule. It is implicated in various biological roles in coding, decoding, regulation, and expression of genes. DNA and RNA are nucleic acids, and, along with proteins and carbohydrates, constitute the three major macromolecules essential for all known forms of life.

Transcript of Basic Terminology of Genetics for Archaeologists

1

Basic Terminology of Genetics for Archaeologists

Shlomo Guil

A concise compilation of terms in Vade Mecum format

AMINO ACIDS :Biologically important organic compounds composed of amine (-NH2) and carboxylic acid (-

COOH) The key elements of an amino acid are carbon ,hydrogen, oxygen, and nitrogen, though other elements

are found in the side-chains of certain amino acids. In the form of proteins, amino acids comprise the second-

largest component (water is the largest) of human muscles, cells and other tissues.

PROTEINS : Molecules made up of amino acids that are needed for the body to function properly. Proteins

perform a vast array of functions within living organisms, including catalysing metabolic reactions, and replicating

DNA. DNA replication is the process of producing two identical replicas from one original DNA molecule.

GENES : Molecular unit of heredity of a living organism. It is used extensively as a name given to some

segments of deoxyribonucleic acids (DNA) and ribonucleic acids (RNA). Living beings depend on genes, as

they specify all proteins and functional RNA chains. Genes hold the information to build and maintain an

organism's cells and pass genetic traits to offspring. All organisms have genes corresponding to various

biological traits. The word gene is derived from the Greek word genesis meaning "birth", or genos meaning

"origin". There are an estimated 20,000-25,000 human protein-coding genes.

DNA : DeoxyriboNucleic Acid or DNA is a molecule that encodes the genetic instructions used in the

development and functioning of all known living organisms and many viruses. DNA is a nucleic acid. Nucleic

acids are polymolecules, or large biomolecules, essential for all known forms of life. Nucleic acids, which

include DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), are made from subunits known as nucleotides.

Most DNA molecules consist of two biopolymer strands coiled around each other to form a double helix

(spiral). The two strands of DNA run in opposite directions to each other and are therefore anti-parallel. DNA is

well-suited for biological information storage.

Nucleobases ,or simply known as bases, are the important building blocks of DNA and RNA ,having the ability

to form base-pairs and to stack upon one another ,leading directly to the helical structure.

The Nucleobases of DNA are C= cytosine ,G= guanine , A= adenine ,T= thymine, abbreviated as C, G, A, T.

In normal spiral DNA the bases form Base Pairs between the two strands: A binds only with T and C binds only

with G. Therefore, if we know the sequence of nucleobases on one strand, we can reconstruct the sequence of

the other strand.

The nucleotides consist of the nucleobases + a five-carbon sugar called deoxyribose + phosphate groups.

Within cells, DNA is organized into long structures called chromosomes. During cell division these

chromosomes are duplicated in the process of DNA replication, providing each cell its own complete set of

chromosomes.

RNA : RiboNucleic Acid (RNA) is a polymeric molecule. It is implicated in various biological roles

in coding, decoding, regulation, and expression of genes. DNA and RNA are nucleic acids, and, along

with proteins and carbohydrates, constitute the three major macromolecules essential for all known forms of life.

2

Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA it is more often found in nature as a

single-strand folded unto itself, rather than a paired double-strand.

DNA transcription is the process in which DNA is copied into RNA. The differences between Replication and

Transcription is that while DNA replication copies an entire helix, DNA transcription only transcribes specific

regions of one strand of the helix.

The Differences Between DNA and RNA : Structurally, DNA and RNA are nearly identical, however, there are

three fundamental differences that account for the very different functions of the two molecules.

1. RNA is a single-stranded nucleic acid.

2. RNA has a ribose sugar instead of a deoxyribose sugar like DNA.

3 .RNA nucleotides (nucleobases) have a uracil base instead of thymine.

DNA and RNA

GENOMES : In modern molecular biology and genetics, the genome is the genetic material of an organism.

It is encoded either in DNA or, for RNA viruses, in RNA. The genome includes both the genes and the non-coding

sequences of the DNA / RNA. The human genome is the complete set of genetic information for humans (Homo

sapiens). This information is encoded as DNA sequences within the 23 chromosome pairs in cell nuclei and in a

small DNA molecule. The genome is organized into 22 paired chromosomes, the X chromosome (one in males,

two in females) and, in males only, one Y chromosome, all being large linear DNA molecules.

CHROMOSOMES : A chromosome is a packaged and organized structure containing most of the DNA of a

living organism. It is a thread-like structure located inside the nucleus of animal and plant cells.Each chromosome

3

comes in two parts called chromatids (half a chromosome) joined together at a protein junction called

a centromere. Chromosomes are normally visible under a light microscope only when the cell is

undergoing mitosis. Mitosis is a part of the cell cycle process by which chromosomes in a cell nucleus are

separated into two identical sets of chromosomes, each in its own nucleus (division of the nucleus). Human cells

are diploid (having two sets of chromosomes) and have 22 different types of autosomes chromosomes (not sex

related), each present as two copies, and two sex chromosomes. This gives 46 chromosomes in total.

The chromosomes count of certain animals are: Dog 78, chicken 78, gorilla 48, chimpanzee 48, orang-utan 48,

lion 38, koala 16, mosquito 6. The number of genes in each chromosome are (examples): No.1- 2000, No 5- 900,

No.13-300, No.18—200.

Each person has one pair of sex chromosomes in each cell. Females have two X chromosomes, while males

have one X and one Y chromosome. It is the possession of a Y chromosome that determines that an individual is

male and it must be inherited by males from their father.

The X- chromosome likely contains 800 to 900 genes that provide instructions for making proteins. The Y-

chromosome likely contains 50 to 60 genes.

The genes, being a segment of DNA on the chromosome, are located at a specific locus (location) on the

chromosome.

Genes are identified in the chromosomes by following parameters (Insulin provided as an example):

Symbol : INS Name: Insulin

HGNC ID: HGNC: 6081 The HUGO ( Human Genome Organisation) Gene Nomenclature Committee

Chromosomal location (LOCUS):11p15.5 (11th chromosome, locus p15.5).

The following internet site contains a search engine for genes and their respective names:

http://www.genenames.org/cgi-bin/search

Information about specific chromosomes (1-22, X ,Y) can be obtained at

http://ghr.nlm.nih.gov/chromosomes

The Human Chromosomes

4

GENE MUTATION: A permanent alteration in the DNA sequence that makes up a gene, such that the

sequence differs from what is found in most people. Mutations range in size; they can affect anywhere from a

single DNA building block (base pair) to a large segment of a chromosome that includes multiple genes.

Mutation is an error in the genetic code caused most often by an incorrect substitution, insertion, or deletion of a

nucleotide.

Mutations can be individual changes in a base, like a change from A to C (see SNP further on) or they can be

insertions or deletions of bases. Insertions or deletions can be of single or multiple bases. A common form of

mutation can be a “stutter” during DNA replication, producing short repetitions sequences (see STR further on).

Gene mutations can be classified in two ways:

Hereditary mutations which are inherited from a parent and are present throughout a person’s life in virtually

every cell in the body

Acquired mutations which occur at some time during a person’s life and are present only in certain cells, not

in every cell in the body. These changes can be caused by environmental factors such as ultraviolet radiation

from the sun, or can occur if a mistake is made as DNA copies itself during cell division.

Genetic alterations that occur in more than 1 percent of the population are called polymorphisms. They are

common enough to be considered a normal variation in the DNA. Polymorphisms are responsible for many of the

normal differences between people such as eye colour, hair colour, and blood type. Although many

polymorphisms have no negative effects on a person’s health, some of these variations may influence the risk of

developing certain disorders.

THE MOLECULAR CLOCK: The molecular clock is the concept that mutations accumulate in DNA at a

roughly constant rate because they occur by chance. . This means that the mutation rate serves as a

"molecular clock." The clock can be used to determine the time since the evolutionary divergence of two species.

Two organisms with very few DNA sequence differences between them diverged more recently than two that

display more accumulated differences.So if we know the rate that those mutations occur by chance, then we

can look at the DNA of the descendant from individual of line A and the descendant of line B, and examine the

number of changes in their mitochondrial DNAs, or nuclear DNAs (DNA of the chromosomes), compare them,

and the number of changes will be roughly proportional to the time that they shared a common ancestor. So

by sequencing the mitochondrial DNA of a large number of people around the world we are able to determine

the relative amount of time since each of them shared a common mother.

HUMAN GENETIC DISORDERS: Most aspects of human biology involve both genetic (inherited) and non-

genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in

nature (height, eye color, ability to taste or smell certain compounds, etc.). Moreover, some genetic disorders

only cause disease in combination with the appropriate environmental factors (such as diet). With these

caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence

variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For

example, cystic fibrosis is caused by mutations in the CFTR gene, locus 7q31.2 , and is the most common

recessive disorder in Caucasian populations with over 1,300 different mutations known.Recessive diseases are

single gene disorders that occur only when an individual carries two malfunctioning copies (mutant alleles) of the

relevant gene.

The gain or loss of DNA from chromosomes can lead to a variety of genetic disorders. For example,

Down syndrome, is usually caused by an extra copy of chromosome. Characteristics include decreased muscle

tone, stockier build, asymmetrical skull, slanting eyes and mild to moderate developmental disability

5

EVOLUTION : Comparative genomics studies of mammalian genomes suggest that approximately 5% of the

human genome has been conserved by evolution since the divergence of extant lineages approximately 200

million years ago, containing the vast majority of genes.

The published chimpanzee genome differs from that of the human genome by 1.23% in direct sequence

comparisons. Humans have undergone an extraordinary loss of olfactory receptor genes during our recent

evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary

evidence suggests that the emergence of colour vision in humans and several other primate species has

diminished the need for the sense of smell.

The “Out of Africa” theory is the most widely accepted explanation of the origin and early dispersal of anatomically

modern humans, Homo sapiens sapiens. The theory states that archaic Homo sapiens evolved into modern

humans solely in Africa, 200,000 to 100,000 years ago. They left Africa 60,000 years ago and over time replaced

earlier human populations such as Neanderthals and Homo erectus on Earth.

HUMAN GENETIC VARIATION: The genetic differences both within and among populations. There may

be multiple variants of any given gene in the human population, leading to polymorphism. Many genes are not

polymorphic, the gene is then said to be fixed. In contrast, an allele (pronounced: Aliel) is a different form of the

same gene that occupies the same locus. On average, biochemically all humans are 99.5% similar to any other

humans. No two humans are genetically identical. Natural selection may confer an adaptive advantage to

individuals in a specific environment if an allele provides a competitive advantage.

Apart from mutations, many genes that may have aided humans in ancient times plague humans today. For

example, it is suspected that genes that allow humans to more efficiently process food are those that make

people susceptible to obesity and diabetes today.

CATEGORIZATION OF THE WORLD POPULATION: New data on human genetic variation has

reignited the debate about a possible biological basis for categorization of humans into races. Most of the

controversy surrounds the question of how to interpret the genetic data and whether conclusions based on it are

sound. Some researchers argue that self-identified race can be used as an indicator of geographic ancestry for

certain health risks and medications.

Some commentators have argued that these patterns of variation provide a biological justification for the use of

traditional racial categories. They argue that the continental clustering correspond roughly with the division of

human beings into sub-Saharan Africans; Europeans, Western Asians, Asians, Southern and Northern

Africans; Eastern Asians, Southeast Asians, Polynesians and Native Americans; and other inhabitants

of Oceania. Other observers disagree, saying that the same data undercut traditional notions of racial groups.

They point out, for example, that major populations considered races or subgroups within races, do not

necessarily form their own clusters.

HAPLOTYPE : A group of alleles (alternative forms of the same gene) of different genes on a single

chromosome that are closely linked to be inherited as a unit. In other words, haplotypes are combination of DNA

segments that are inherited together from one parent.

The following DNA tools of analysis serve to identify Haplotypes and are discussed hereunder.

1. Mitochondrial DNA (mtDNA)

2. Single Nucleotide Polymorphisms (SNP)

3. Short Tandem Repeat (STR)

6

mtDNA, SNP and STRs are commonly used types of genetic markers. A genetic marker is a gene or DNA

sequence with a known location on a chromosome that can be used to identify individuals or species. Genetic

markers are employed in genealogical DNA testing for genetic genealogy to determine genetic distance between

individuals or populations.

HAPLOGROUP: The term is used to describe individual branches on genetic family trees. Haplogroups are

defined by particular genetic mutations that are shared by all the people who belong to them. All members of a

haplogroup trace their ancestry back to the single individual in which that defining mutation arose. A haplogrourp

can also be defined as a number of haplotypes that share a common ancestor. Simply put, a group of related

haplotypes make up a haplogroup.

Haplogroups are usually assigned letters of the alphabet , and refinements consist of additional number and letter

combinations.

For example, Haplogroup J2 is a Y-chromosome haplogroup. It originates between the Caucasus

Mountains, Mesopotamia and the Levant. The present-day ethnicities who have the strongest amounts of J2

include Mesopotamians and Levantine peoples, Mediterranean/Aegean peoples, Greco-Anatolians, Caucasians,

South and Central Asians. The parent haplogroup of J2 began 31,700 ± 12,800 Years Before Present. Initial

consensus suggested that J2 be identified with the Canaanite-Phoenician (Northwest Semitic) population.

Maternal haplogroups are families of mitochondrial DNA types that all trace back to a single mutation at a

specific place and time. By looking at the geographic distribution of mtDNA (mitochondrial DNA) types, we learn

how our ancient female ancestors migrated throughout the world

Paternal haplogroups are families of Y chromosomes that all trace back to a single mutation at a specific

place and time. By looking at the geographic distribution of these related lineages, we learn how our ancient male

ancestors migrated throughout the world.

NUCLEAR DNA (nDNA): DNA of the chromosomes contained within a cell nucleus of organisms. Nuclear

DNA encodes for the majority of the genome , with DNA located in mitochondria coding for the rest.

MITOCHONDRIAL DNA (mtDNA) :Each of our cells contains structures called mitochondria.

Mitochondrial DNA (mtDNA) is the DNA located in mitochondria (a membrane-bound organelle found in most

eukaryotic cells, being cells which contain a nucleus) that convert chemical energy from food into a form that

cells can use. There are approximately 2,000 mitochondria in every cell .Mitochondria contain their own small

genomes (37 genes), and this mitochondrial DNA (mtDNA) is handed down from mother to child. With mtDNA

we can trace maternal ancestry. Both males and females have mtDNA but males do not pass their mtDNA on to

their offspring. The DNA sequence of mtDNA permits an examination of the relatedness of populations, and so

has become important in anthropology. Lineages of mtDNA fall into the category of haplotypes.

By virtue of its abundance, mtDNA is easier to analyse than nuclear DNA. Furthermore, it is most likely to be

preserved in archaeological remains in comparison to nuclear DNA.

SINGLE NUCLEOTIDE POLYMORPHISMS (SNP): Areas in the genome where the DNA sequences of

many individuals differ by only a single base are called single nucleotide polymorphisms (SNPs), known as

snips .For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA,

contain a difference in a single nucleotide.

7

About 10 million snips exist in human populations. Alleles (alternative forms of the same gene) of snips that are

close together tend to be inherited together. A set of associated snips alleles in a specific area of a chromosome

fall into the category of haplotypes.

An International HapMap Project has been set up in order to develop a haplotype map of the human genome.

The respective internet site can be viewed at www.hapmap.org

SHORT TANDEM REPEATS (STR): Repetitive segments of DNA, two to five base pairs (bp) in length,

which are repeated several times in a row (tandem) on a DNA segment. The number of repeats of each STR at

each genetic site varies within the human populations, and this variability in the number of repeats makes STR

DNA testing extremely valuable as a human markers for linkage analysis, particularly in the Y-chromosome.

STRs are commonly referred to as microsatellites.

The Y-chromosome is paternally-inherited (from father to son). The polymorphisms of Y-chromosomal short

tandem repeat (Y-STR) are a powerful tool for identification and confirmation of shared paternity.

For this test, common markers of the Y-chromosome are analysed. Each marker typically contains one piece of

numerical information which represents the number of times a short sequence of DNA bases is repeated at a

specific location on the Y-chromosome. For example, a 4 base pair short sequence of DNA (i.e. A G G T) at

position DYSxxx would result in the following sequence:

CCAGGTAGGTAGGTAGGTAGGTCTT

Here the 4 base pair sequence AGGT is repeated 5 times at this location. Therefore the information we get at

marker DYSxxx is '5'.

The DYS, prefix represents: D for DNA. Y for Y-Chromosome. S for the unique segment identification.

For example, DYS393 is a DNA segment on the Y-Chromosome. The S indicates that it is a unique segment

given the number 393 as an identifier.

In contrast to autosomal STRs, the Y- chromosomal STRs represent a haplotype. Y-chromosomal STR typing

enables the follow-up of paternal lineages analogous to mtDNA that supports the recognition of maternal

lineages. Because of regional variation of the Y-chromosomal haplotype frequencies, it is a valuable tool for

migration research.

POPULATION ORIGINS AND MIGRATIONS: Analysis of the Y chromosome and mtDNA serve in

constructing population origins and migrations by tracing the movement of males (Y chromosome) and females

(mtDNA). Interestingly, in many cases, mtDNA and Y chromosomes studies show quite different patterns of

population relationships. The most likely cause for the different patterns in mtDNA and Y chromosomes variation

is that the mobility of men and women are probably different on kinship practices regarding whether the man or

the women moves when a couple marries. A difference in the mutation rate between mtDNA and Y chromosome

can also contribute to the difference in the patterns. An additional factor influencing the geographical patterning of

Y chromosome and mtDNA diversity is demography because a man can father many children in his lifetime,

whereas women can only give birth to a few.

An alternative approach to tracking human migrations is studying genetic variations in the range of organisms that

were transported by humans as they moved across the globe. This relates to domesticated animals such as dogs,

cattle, sheep, pigs and plants.

8

ARCHAEOGENETICS and ANCIENT DNA : Archaeogenetics is a term coined

by British archaeologist Colin Renfrew, referring to the application of the techniques of molecular population

genetics to the study of the human past. This can involve: 1) The analysis of DNA from modern populations

(including humans and domestic plant and animal species) in order to study human past and the genetic legacy of

human interaction with the biosphere; and 2) The application of statistical methods developed by molecular

geneticists to archaeological data.

In ancient DNA (aDNA) studies on human remains, many studies utilized mummified tissue as a source of

ancient human DNA. Examples include both naturally preserved specimens, for example, those preserved in ice,

such as Ötzi the Iceman, or through rapid desiccation (extreme dryness), such as high-altitude mummies from

Andes as well as various sources of artificially preserved tissue (such as the chemically treated mummies of

ancient Egypt). However, mummified remains are a limited resource, and the majority of human aDNA (ancient

DNA) studies have focused on extracting DNA from two sources that are much more common in the

archaeological record – bone and teeth (or more precisely, dentine) . Recently, several other sources have also

yielded DNA, including paleofaeces and hair.

As soon as an organism dies, its DNA begins to decompose. Consequently, aDNA research suffers from two

major problems: 1) the gross concentration of DNA in an ancient sample is far less than that in fresh tissue; and

2) the fragments into which the surviving DNA has degraded are short, making them difficult to analyse.

Most of the current ancient DNA studies are focused on the mtDNA (mitochondrial DNA) for the following

reasons. Firstly, mtDNA is especially useful for tracking historic relationships. Secondly, each cell contains

multiple mitochondria and each mitochondrion contains multiple copies of the mitochondrial genome. For every

copy of a nuclear gene in a cell, there are hundreds or even thousands of copies of the mitochondrial genome.

This numerical advantage makes mtDNA much more likely than nuclear DNA to survive and to be recovered. It

should also be noted that ancient nuclear DNA degrades at least twice as fast as ancient mitochondrial DNA.

The most important environmental factor for DNA preservation is temperature. More specifically, the

temperature a sample has been exposed to during its “lifetime”, e.g. the thermal history of a sample. Fewer

temperature fluctuations and low mean annual temperatures favours DNA survival and preservation. This

correlation is well illustrated by the fact that the oldest reliable aDNA results are mainly from permafrost or cave

environments. To date, the oldest skeletal remains from which aDNA has been recovered are dated to the tens

of thousands of years before present.

Contamination remains a major problem when working on ancient human material. Unlike modern genetic

analyses, ancient DNA studies are characterized by low quality DNA. It should be noted that aDNA may contain a

large number of post-mortem mutations, which are increasing with time.

Prior to entrance in the laboratory, a specimen can be contaminated by the microbial degradation after death,

by burial proximity with other organisms, and/or by boiling/cooking after death (mostly an issue with animal

bones). Samples can also be contaminated by human handling during the excavation, during its cleaning and

storage in museums, and during morphological investigations.

Ancient DNA (aDNA) serves also in Pathogen and microorganism analyses. Pathogens are defined as anything

that can produce disease. The use of degraded human samples in aDNA analyses has not been limited to the

9

amplification of human DNA. It is reasonable to assume that for a period of time post-mortem, DNA may survive

from any microorganisms present in the specimen at death. These include pathogens present at the time of

death (either the cause of death or long-term infections) and other associated microbes.

POLYMERASE CHAIN REACTION (PCR): A technique that makes it possible to replicate minimal

amounts of DNA fragments. The term used is actually “amplifies” rather than “replicates”. PCR generates an

unlimited number copies of nucleic acids fragments from a few and even from single, usually short and often

damaged fragments. By using enzymes known as polymerases (enzymes isolated from bacterium ), which copy

genetic material in living organisms, PCR replicates specific fragments of nucleic acids extracted from tissue

remains derived from ancient animals, plants, or microorganisms, regardless of their age. It is a cyclic reaction

repeated again and again, simulating the way nature creates DNA. In practical terms, the PCR procedure involves

repeatedly heating and cooling a fragment (or fragments) of DNA through a cycle of well-established

temperatures.

Since the development of PCR, ancient nucleic acids, particularly DNA studies, are providing clues for solving

quite a wide range of anthropological and archaeological issues important for understanding the evolution of

modern humans. The relationships, or lack thereof, between ancient human populations, for example, has been

elucidated through the study of the DNA extracted from ancient bone.

GENE PREDICTION OR GENE FINDING: In computational biology gene prediction or gene finding

refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding

genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory

regions. Gene finding is one of the first and most important steps in understanding the genome of a species once

it has been sequenced. Today, with comprehensive genome sequence and powerful computational resources at

the disposal of the research community, gene finding has been redefined as a largely computational problem.

CLONING : Molecular Cloning refers to processes used to create multiple molecules copies of DNA

fragments , cells, or organisms. Cloning of any DNA fragment essentially involves four steps:

Fragmentation - breaking apart a strand of DNA

Ligation - gluing together pieces of DNA in a desired sequence

Transfection - inserting the newly formed pieces of DNA into cells

Screening/Selection - selecting out the cells that were successfully transfected with the new DNA

Organism cloning (also called reproductive cloning) refers to the procedure of creating a new multicellular

organism, genetically identical to another. Artificial cloning of organisms may also be called reproductive

cloning.

Dolly the sheep, was the first mammal to have been successfully cloned from an adult cell. Dolly was formed by

taking a cell from the udder of her biological mother (sheep 1). Dolly's embryo was created by taking the cell (of

sheep 1) and inserting it into a sheep ovum (egg cell) of sheep 2 which was first stripped of its DNA. The embryo

was then placed inside a female sheep (sheep 3) that went through a normal pregnancy.

Human cloning is the creation of a genetically identical copy of a human. The term is generally used to refer to

artificial human cloning, which is the reproduction of human cells and tissues. The possibility of human cloning

has raised controversies. These ethical concerns have prompted several nations to pass legislature regarding

human cloning and its legality.

10

Elements of Genetic Engineering

VIRUS: A virus is a small infectious agent that replicates only inside the living cells of other organisms. Viruses

can infect all types of life forms, from animals and plants to microorganisms, and bacteria. Virus particles (known

as virions) consist of: I) The genetic material made from either DNA or RNA; II) A protein coat that protects these

genes.

In evolution, viruses are an important means of horizontal gene transfer, which increases diversity. Viral

populations do not grow through cell division, because they are acellular. Instead, they use the machinery and

metabolism of a host cell to produce multiple copies of themselves, and they assemble in the cell.

Virus

EUKARYOTE: Any organism whose cells contain a nucleus and other organelles enclosed

within membranes.

PROKARYOTE :A single-celled organism that lacks a membrane- bound nucleus ,

Mitochondria, or any other membrane-bound organelles

11

BACTERIA: A large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria

have a number of shapes, ranging from spheres to rods and spirals.

Bacterium

RECOMBINANT DNA (rDNA): DNA molecules formed by laboratory methods of genetic recombination

(such as molecular cloning) to bring together genetic material from multiple sources, creating sequences that

would not otherwise be found in biological organisms. Recombinant DNA is possible because DNA

molecules from all organisms share the same chemical structure. They differ only in the nucleotide sequence

within that identical overall structure. Nucleotides are organic molecules that serve as subunits of nucleic acids

like DNA and RNA (see discussion above).

Formation of recombinant DNA requires a cloning vector, a DNA molecule that replicates within a living cell.

Vectors are generally derived from plasmids or viruses. Plasmids are circular DNA molecules found in

bacteria.

GENETIC ENGINEERING: The direct manipulation of an organism's genome using biotechnology. New

DNA may be inserted in the host genome by first isolating and copying the genetic material of interest using

molecular cloning methods to generate a DNA sequence, or by synthesizing the DNA, and then inserting this

construct into the host organism. An organism that is generated through genetic engineering is considered to

be a genetically modified organism (GMO).

Genetic engineering alters the genetic make-up of an organism using techniques that remove heritable material or

that introduce DNA prepared outside the organism. Genetic engineering can also be used to remove genetic

material from the target organism, creating a gene knockout organism.

VIRAL VECTOR: Viral vectors are a tool commonly used by molecular biologists to deliver genetic material

into cells. This process can be performed inside a living organism (in vivo) or in cell culture (in vitro). Viruses

have developed specialized molecular mechanisms to efficiently transport their genomes inside the cells

they infect. Delivery of genes by a virus is termed transduction and the infected cells are described as

transduced. Transduction is the process by which DNA is transferred from one bacterium to another by a

virus or whereby foreign DNA is introduced into another cell via a viral vector. Viruses are excellent vectors,

because they have gained, through long periods of evolution, the ability to avoid destruction by the human

immune system, and have the capacity to get their own genetic material inside specific cells.

12

Non-viral substances such as Ormosil have been used as DNA vectors and can deliver DNA loads to specifically

targeted cells in living animals. (Ormosil stands for organically modified silica or silicate.). These are

Nanoengineered substances.

PLASMIDS AS VECTORS: Plasmids are small circular DNA molecules found in bacteria and other cells.

They are physically separated from chromosomal DNA and can replicate independently. They generally carry only

a small number of genes, notably some associated with antibiotic resistance. Artificial plasmids are widely used

as vectors in molecular cloning. Plasmids can be transmitted from one bacterium to another, even of another

species.

LIPOSOMES AS VECTORS: When adding extra DNA to any cell by means of a virus, usually that extra

DNA includes unwanted viral DNA, that it may cause cancer. Hence, it is presently not suitable for human

patients. The wrapping of DNA into Liposomes is safer, since only the desired segment of DNA goes into the

cell. A liposome is an artificially-prepared tiny bubble (vesicle), made out of the same material as a cell

membrane. However, the liposome method remains inefficient.

GENE THERAPY: Gene therapy is a technique for correcting defective genes responsible for disease

development. In the future, gene therapy may provide a way to cure human genetic disorders. Because these

diseases result from mutations in the DNA sequence for specific genes, gene therapy trials have used viruses to

deliver un-mutated copies of these genes to the cells of the patient's body.

TRANSGENIC ANIMALS: A transgenic animal is an animal that carries a specific and deliberate

modification of its genome. To establish a transgenic animal, foreign DNA constructs need to be introduced into

the animal’s genome, using recombinant DNA technology, so that the construct is stably maintained, expressed

and passed on to subsequent generations.

ENZYMES: Macromolecular proteins that catalyse reactions in living organisms, but are not themselves

altered. Enzymes accelerate, or catalyse, chemical reactions.

DNA- CUTTING TOOLS: The DNA in a chromosome could be digested by a DNA-cutting enzyme into

many small fragments of regular size, such as 200, 400, 600, 800 etc. base-pairs. These particles are called

nucleosomes. These restriction (cutting) enzymes are essentially molecular scissors, which can cut DNA at

precisely defined sequences. The enzymes are found in bacterial cells, where they function as part of a protective

mechanism. (See Ligase or joining enzyme further on).

The CRISPR-Cas9 enzyme was developed very recently as a precision “DNA scissors” tool. It is based upon a

defence mechanism known as CRISPR (pronounced, Crisper) which is used by bacteria and Archea for the

degradation (cutting) of foreign genetic material resulting from a viral infection (invasion of virus into the bacteria).

Cas9 is the name of the enzyme. Archaea constitutes a domain of single-celled microorganisms. These microbes

are prokaryotes, meaning that they lack a membrane- bound nucleus.

The announcement of Crispr/ Cas9 has generated considerable interest even beyond the limits of the

professional circle of genetics. The Economist journal (May 2nd 2015) described the importance of Crispr / Cas9

as follows:

"CRISPR/Cas9 is a large bit of molecular machinery (see picture) derived from a bacterial defence system that

chops up the DNA of invading viruses. In nature, it recognizes DNA sequences that are foreign to the bacterium,

but the recognition mechanism can be modified to search for any given sequence and cut the DNA there. If

this is done to a gene in an animal or plant cell, the cell will try to repair itself using the other copy of the gene

present (for there is one from each parent) as a template. That process can be subverted by injecting an artificial

template of the desired DNA sequence, which is then used as a model for repair."

13

CRISPR / CAS9

JOINING DNA MOLECULES: DNA ligase is a cellular enzyme which is used to repair broken DNA bonds

(joining of DNA strands together) that may occur at random or as consequence of DNA replication or

recombination. It can therefore be considered as a molecular glue, which is used to stick pieces of DNA

together. The enzyme used most often is T4 DNA ligase. The ability to cut, modify and join DNA molecules gives

the genetic engineer the freedom to create recombinant DNA molecules.

MEANS OF DETERMINING THE FUNCTION OF SPECIFIC GENES: It is possible to determine the

function of any normal gene, or else see if a mutant gene is defective, by injecting the DNA for such a gene into

and egg of a mouse. The injected DNA may then be incorporated into a mouse chromosome by joining of new

DNA onto old DNA. When the mouse with altered DNA grows to an adult, it may show differences in physiology

(study of normal function in living systems) or behavior relative to a normal mouse.

Thus, wherever the chromosomal DNA of a mouse is cut in an egg cell, it has to be repaired by a Ligase (joining

enzyme) which connects the broken ends together, or else the cell would die.

DNA IN CRIME SCENES: Some of the DNA within chromosomes is highly specific to certain individuals, just

as are fingerprints and hence may be used to identify suspects. This Forensic Analysis

(application of scientific knowledge and methodology to legal problems) is also applied in paternity disputes. The

use of DNA profiling is now accepted as an important way of generating evidence in legal cases. DNA profiling

can only examine a small part of the genome. Thus the odds of a chance match need to be calculated. The more

bands present in DNA profile, the less likely a non-related match will be found. For example, with 4 bands the odd

against a chance match are 1:250 while the odds with 10 bands are 1: 1,000,000.

14

Selected Bibliography and References

This compilation is based upon text and information provided by various relevant internet sites in addition to the

following publications.

Calladine C., Drew H. ,1997 Understanding DNA. The Molecule and How It Works, Academic Press.

Hummel S. 2003 Ancient DNA Typing , Springer.

Matisoo-Smith E. , Horsburgh K. 2012 DNA for Archaeologists, Left Coast Press.

Nicholl D. , 2002 An Introduction to Genetic Engineering, Cambridge University Press.