pengantar DBMS

8/9/2019 pengantar DBMS

1/39

CH APT ER

3Introduction to SQL

There are a number of database query languages in use, either commercially orexperimentally.In this chapter,as well as in Chapters 4 and 5, we study themost widely used query language,SQL.

Although we refer to theSQLlanguage as aquery language, it can domuch more than just query a database. It can define the structure of the data,modify data in the database, and specify security constraints.It is not our intention to provide a complete users guide forSQL. Rather,we

present SQLs fundamental constructs and concepts. IndividualimplementationsofSQLmay differ in details, or may support only a subset of the full language.

3.1 Overview of the SQL Query Language

IBMdeveloped the original version ofSQL,originally called Sequel, as part of theSystem R project in the early 1970s. The Sequel language has evolved since then,and its name has changed toSQL(Structured Query Language). Many productsnow support theSQLlanguage.SQLhas clearly established itself asthestandardrelational database language.In 1986, the American National Standards Institute (ANSI) and the Interna-

tional Organization for Standardization (ISO) published anSQLstandard, calledSQL-86.ANSIpublished an extended standardforSQL,SQL-89,in 1989. The nextver-sionof thestandardwasSQL-92standard,followedbySQL:1999,SQL:2003,SQL:2006, and most recently SQL:2008. The bibliographic notes providereferences to these standards.TheSQLlanguage has several parts:

Data-definitionlanguage(DDL). TheSQL DDLprovidescommands fordefin-ing relation schemas, deleting relations, and modifying relationschemas.

Data-manipulation language(DML). TheSQL DMLprovides the ability to

query information from the database and to insert tuples into, delete tuplesfrom, and modify tuples in the database.


2/39

57


3/39

3.2 SQL Data Definition 3

Integrity.TheSQL DDLincludes commands for specifying integrity con-straints that the data stored in the database must satisfy.Updates thatviolate integrity constraints are disallowed.

Viewdefinition.TheSQL DDLincludes commands for defining views.

Transaction control.SQLincludes commands for specifying the beginningand ending of transactions.

EmbeddedSQLanddynamicSQL.Embedded and dynamicSQLdefine howSQLstatements can be embedded within general-purpose programming lan-guages, such as C, C++, and Java.

Authorization.TheSQL DDLincludes commands for specifying accessrightsto relations and views.

In this chapter,we present a survey of basicDMLand theDDLfeatures ofSQL. Features described here have been part of theSQLstandard sinceSQL-92.In Chapter 4, weprovidea moredetailed coverageof theSQLquery

language, including (a) various join expressions; (b) views; (c) transactions; (d)integrity constraints; (e) type system; and (f) authorization.

InChapter5, we cover moreadvanced featuresof theSQLlanguage,including (a) mechanisms to allow accessing SQL from a programminglanguage; (b)SQLfunctions and procedures; (c) triggers; (d) recursive queries;(e) advanced aggre- gation features; and (f) several features designed for dataanalysis, which were introduced inSQL:1999,and subsequent versions ofSQL.Later,in Chapter 22, we outline object-oriented extensions toSQL,which wereintroduced inSQL:1999.

Although mostSQLimplementations support the standard features we de-scribe here,you should be aware that there are differences betweenimplementa-tions. Mostimplementationssupport some nonstandard features, while omittingsupport for some of the more advanced features. In case you find that some lan-guage features described here do not work on the database system that you use,consult the user manuals for your database system to find exactly what featuresit supports.

3.2 SQL Data Definition

The set of relations in a database must be specified to the system by means of adata-definition language (DDL). TheSQL DDLallows specification of not only aset of relations, but also information about each relation, including:

The schema for each relation.

The types of values associated with each attribute.

The integrity constraints.

The set of indices to be maintained for each relation.


4/39

The security and authorization information for each relation.

The physical storage structure of each relation on disk.

We discuss here basic schema definition and basic types; we defer discussion ofthe otherSQL DDLfeatures to Chapters 4 and 5.

3.2.1 Basic Types

TheSQLstandard supports a variety of built-in types, including:

char(n): A fixed-length character string with user-specified lengthn.Thefullform,character,can be used instead.

varchar(n): A variable-length character string with user-specified maximum

lengthn.The full form,character varying,is equivalent.int: An integer (a finite subset of the integers that is machinedependent).Thefull form,integer,is equivalent.

smallint: A small integer (a machine-dependent subset of the integer type).

numeric(p,d): Afixed-point numberwithuser-specified precision.Thenum-ber consists ofpdigits (plus a sign), anddof thepdigits are to theright of the decimal point. Thus,numeric(3,1) allows 44.5 to be storedexactly,butneither 44A4.5poar0g.32ocanPbeDstForedEexnachtlyain afieldrof this type.

real, double precision: Floating-point and double-precision floating-pointnumbers with machine-dependent precision.

float(n):A floating-point number,with precision of at leastndigits.

Additional types are covered in Section 4.5.Each type may include a special value called thenullvalue. A null value

indicates an absent value that may exist but be unknown or that may not exist atall. In certain cases, we may wish to prohibit null values from being entered, aswe shall see shortly.

Thechardata type stores fixed length strings. Consider,for example, anattributeAof typechar(10). If we store a stringAviin this attribute, 7 spacesare appended to the string to make it 10 characters long. In contrast, if attributeBwere of typevarchar(10), and we storeAviin attributeB,no spaces would beadded. When comparing two values of typechar,if they are of different lengthsextra spaces are automatically added to the shorter one to make them the samesize, before comparison.When comparingachartypewithavarchartype,one mayexpect extraspaces

to be added to thevarchartype to make the lengths equal, before comparison;however,this may or may not be done, depending on the database system. As aresult, even if the same valueAviis stored in the attributesAandBabove, acomparisonA=Bmay return false. We recommend you always use thevarchar


5/39

type instead of thechartype to avoid these problems.


6/39

SQLalso provides thenvarchartype to store multilingual data using theUnicode representation.However,many databases allow Unicode (in theUTF-8representation) to be stored even invarchartypes.

3.2.2 Basic Schema Definition

We define anSQLrelation by using thecreate tablecommand. The followingcommand creates a relationdepartmentin the database.

create tabledepartment(dept namevarchar(20),building varchar(15),budget numeric(12,2),primary key(deptname));

The relation created above has three attributes,dept name,which is a characterstring of maximum length 20,building,which is a character string of maximumlength 15, andbudget, which is a number with 12 digits in total, 2 of which areafter the decimal point. Thecreate tablecommand also specifies that thedeptnameattribute is the primary key of thedepartmentrelation.The general form of thecreate tablecommand is:

create table (A1 D1,A2 D2,

. .. ,An Dn,integrity-constraint

1,

. . . ,integrity-constraint

k);

whereris the name of the relation, eachAiis the name of an attribute in theschema of relationr,andDiis the domain of attributeAi; that is,Dispecifiesthe type of attributeAialong with optional constraints that restrict the set ofallowed values forAi.The semicolon shown at the end of thecreate tablestatements, as wellas

at the end of otherSQLstatements later in this chapter,is optional in manySQLimplementations.SQLsupports a number of different integrity constraints. In this section,we

discuss only a few of them:

primary key(Aj1,Aj2, . . . ,Ajm): Theprimary-keyspecification says thatat- tributesAj1,Aj2, . . . ,Ajm form the primary key for the relation. Theprimary- key attributes are required to benonnullandunique; that is, notuple can have a null value for a primary-key attribute, and no two tuplesin the relation can be equal on all the primary-key attributes. Althoughthe primary-key


7/39

specification is optional, it is generally a good idea to specify a primary keyfor each relation.

foreign key(Ak1,Ak2, . . . ,Akn)referencess: Theforeign keyspecificationsays that the values of attributes(Ak1,

Ak2, . . . ,Akn) for any tuple in therelation must correspond to values of the primary key attributes of sometuple in relations.Figure 3.1 presents a partialSQL DDLdefinitionof the university databasewe use in the text. Thedefinitionof thecoursetable has adeclarationforeignkey(dept name)referencesdepartment.Thisforeign-key declarationspecifiesthat for each course tuple, the department name specified in thetuple must exist in the primary key attribute (dept name) of thedepartmentrelation. Without this constraint, it is possible for a course to specify anonexistent departmentname. Figure 3.1 also shows foreign key constraintson tablessection,instructorandteaches.

not null: Thenot nullconstraint on an attribute specifies that the null valueis not allowed for that attribute; in other words, the constraint excludes thenull value from the domain of that attribute. For example, in Figure 3.1, thenot nullconstraint on thenameattribute of theinstructorrelation ensuresthat the name of an instructor cannot be null.

More detailson theforeign-key constraint,aswellas onother integrityconstraints that thecreate tablecommand may include, are provided later,inSection 4.4.

SQLprevents any update to the database that violates an integrity constraint.For example, if a newly inserted or modified tuple in a relation has null valuesfor any primary-key attribute, or if the tuple has the same value on the primary-keyattributesas does another tuple in the relation,SQLflags an error andpreventsthe update. Similarly,an insertion of acoursetuple with adept namevalue that does not appear in thedepartmentrelation would violate the foreign-key constraint oncourse,andSQLprevents such an insertion from taking place.A newly created relation is empty initially. We can use theinsertcommand

to load data into the relation. For example, if we wish to insert the fact that thereis an instructor named Smith in the Biology department withinstructor id10211and a salary of $66,000, we write:

insert intoinstructorvalues(10211, Smith, Biology, 66000);

The values are specified in theorderin which the corresponding attributes arelistedin therelation schema.Theinsert commandhas anumberofusefulfeatures, and is covered in more detail later,in Section 3.9.2.We can use thedeletecommand to delete tuples from a relation. Thecommand

delete fromstudent;


8/39

create tabledepartment(dept name varchar(20),building varchar(15),

budget numeric(12,2),primary key(deptname));

create tablecourse(course id varchar(7),title varchar(50),deptname varchar(20),credits numeric(2,0),primary key(course id),foreign key(deptname)referencesdepartment);

create tableinstructor(ID varchar(5),name varchar(20)not null,deptname varchar(20),salary numeric(8,2),primary key(ID),foreign key(dept name)referencesdepartment);

create table section

(course id varchar(8),secid varchar(8),semester varchar(6),year numeric(4,0),building varchar(15),room numbervarchar(7),time slot id varchar(4),primary key(course id,sec id,semester,year),foreign key(courseid)referencescourse);

create tableteaches

(ID varchar(5),courseid varchar(8),secid varchar(8),semester varchar(6),year numeric(4,0),primary key(ID,course id,sec id,semester,year),foreign key(courseid,sec id,semester,year)referencessection,foreign key(ID)referencesinstructor);

Figure 3.1 SQL data definition for part of the university database.


9/39

9 Chapter 3 Introduction to SQL

would delete all tuples from thestudentrelation. Other forms of the delete com-mand allow specific tuples to be deleted; the delete command is covered inmore detail later,in Section 3.9.1.To remove a relation from anSQLdatabase, we use thedrop tablecommand.

Thedrop tablecommand deletes allinformationabout thedropped relationfromthe database. The command

drop tabler;

is a more drastic action than

delete fromr;

The latter retains relationr,but deletes all tuples inr.The former deletes notonly all tuples ofr,but also the schema forr.Afterris dropped, no tuples can

be inserted intorunless it is re-created with thecreate tablecommand.

We use thealter tablecommand to add attributes to an existing relation. Alltuples in the relation are assignednullas the value for the new attribute. Theform of thealter tablecommand is

alter tableraddAD;

whereris the name of an existing relation,Ais the name of the attribute to be

added, andDis the type of the added attribute. We can drop attributes from arelation by the command

alter tablerdropA;

whereris the name of an existing relation, andAis the name of an attribute ofthe relation. Many database systems do not support dropping of attributes,although they will allow an entire table to be dropped.

3.3 Basic Structure of SQL Queries

The basic structure of anSQLquery consists of three clauses:select,from,andwhere.The query takes as its input therelationslisted in thefrom clause,operates on them asspecifiedin thewhereandselectclauses, and thenproduces a relation as the result. We introduce theSQL syntax throughexamples, and describe the general structure ofSQLqueries later.

3.3.1 Queries on a Singe !eation

Let us consider a simple query using our university example,Find the names

of all instructors.Instructor names are found in theinstructorrelation,so we


10/39

name

Srinivasan

WuMozartEinsteinElSaidGoldKatzCalifieriSinghCrickBrandtKim

Figure 3.2 Result of selectnamefrominstructor.

put that relation in thefromclause.Theinstructors name appears in thenameattribute,so we put that in theselectclause.

selectnamefrominstructor;

The result is a relation consisting of a single attribute with the headingname.If

theinstructor

relation is as shown in Figure 2.1, then the relation that resultsfromthe preceding query is shown in Figure 3.2.Now consider another query,Find the department names of all instructors,

which can be written as:

selectdeptnamefrominstructor;

Since more than one instructor can belong to a department, a department namecould appear more than once in theinstructorrelation. The result of the above

query is a relation containing the department names, shown in Figure 3.3.In the formal, mathematical definition of the relational model, a relation is aset. Thus, duplicate tuples would never appear in relations. In practice,duplicateelimination is time-consuming. Therefore,SQLallows duplicates in relations aswell as in the results ofSQLexpressions. Thus, the precedingSQLquery listseach department name once for every tuple in which it appears in theinstructorrelation.In those cases where we want to force the elimination of duplicates, we insert

the keyworddistinctafterselect.We can rewrite the preceding queryas:

select distinctdeptnamefrominstructor;


11/39

deptname

Comp. Sci.

FinanceMusicPhysicsHistoryPhysicsComp. Sci.HistoryFinanceBiologyComp. Sci.Elec. Eng.

Figure 3.3 Result of selectdept namefrominstructor.

if we want duplicates removed. The result of the above query would containeach department name at most once.SQLallows us to use the keywordallto specify explicitly that duplicates are

not removed:

Apago PDseFlect aEllndehptananmecerfrominstructor;

Since duplicate retention is the default, we shall not useallin our examples. Toensure the elimination of duplicates in the results of our example queries, weshall usedistinctwhenever it is necessary.

Theselectclause may also contain arithmetic expressions involving the op-erators+,,,and/operating on constants or attributes of tuples. Forexample, the query:

selectID,name,dept name,salary*1.1frominstructor;

returnsarelationthat is the same as theinstructorrelation, exceptthat the

attributesalaryis multiplied by 1.1. This shows what would result if we gave a10% raise to each instructor; note, however,that it does not result in any changeto theinstructorrelation.SQLalso provides special data types, such as various forms of thedatetype,

and allows several arithmetic functions to operate on these types. We discussthisfurther in Section 4.5.1.Thewhereclause allows us to select only those rows in the result relation of

thefromclause that satisfy a specified predicate. Consider the queryFind thenames of all instructors in the Computer Science department who have salarygreater than $70,000. This query can be written inSQL

as:


12/39

name

Katz

Brandt

Figure 3." Result of Find the names of all instructors in the Computer Science department

who have salary greater than !"#""".

selectnamefrominstructorwheredeptname= Comp. Sci.andsalary> 70000;

If theinstructorrelation is as shown in Figure 2.1, then the relation that results

from the preceding query is shown in Figure 3.4.SQLallows the use of the logical connectivesand,or,andnotin thewhereclause. The operands of the logical connectives can be expressions involvingthe comparison operators=,=,and.SQLallows us to use thecomparison operators to compare strings and arithmetic expressions, as well asspecial types, such as date types.We shallexploreotherfeaturesofwhereclausepredicateslater in thischapter.

3.3.2 QueriesA

op

n #a

ug

tio

pe !P

eD

atF

ionsEnhancer

So far our example queries were on a single relation.Queries often need to

accessinformation from multiple relations. We now study how to write such queries.An an example, suppose we want to answer the queryRetrieve the names

of all instructors, along with their department names and department buildingname.Looking at the schema of the relationinstructor,we realize that we can get

the department name from the attributedeptname,but the department buildingname is present in the attributebuildingof the relationdepartment.To answer thequery,each tuple in theinstructorrelation must be matched with the tuple inthedepartmentrelation whosedept namevalue matches thedept namevalue of the

instructortuple.InSQL,toanswertheabove query,we list therelationsthatneedto beaccessed

in thefromclause, and specify the matching condition in thewhereclause. Theabove query can be written inSQLas

selectname,instructor.dept name,buildingfrominstructor,departmentwhereinstructor.deptname= department.deptname;

If theinstructoranddepartmentrelations are as shown in Figures 2.1 and 2.5respectively,then the result of this query is shown in Figure 3.5.Note that the attributedept nameoccurs in both the relationsinstructorand

department,and the relation name is used as a prefix (ininstructor.deptname,and


13/39

Figure 3.$ $he result of Retrieve the names of all instructors# along with their department

names and department building name.

department.deptname)tomake cleartowhich attributewe arereferring.Incontrast, the attributesnameandbuildingappear in only one of the relations, andtherefore do not need to be prefixed by the relation name.This naming conventionrequiresthat the relations that are present in thefrom

clause have dA istpincat

gnaomes.PTDhisFreqEuirnemheantncacuseesrproblems in somecases,such as when information from two different tuples in the same relation needsto

be combined. In Section 3.4.1, we see how to avoid these problems by using therename operation.We now consider the general case ofSQLqueries involving multiplerelations.

As we have seen earlier,anSQLquery can contain three types of clauses, theselectclause,thefromclause,and thewhereclause.The role of each clause is

asfollows:

Theselectclause is used to list the attributes desired in the result of aquery.

Thefromclause is a list of the relations to be accessed in the evaluation ofthe query.

Thewhereclause is a predicate involving attributes of the relation in thefromclause.

A typicalSQLquery has the form

selectA1,A2, . . . ,

name

Srinivasan

dept name

Comp. Sci.

building

TaylorWu Finance PainterMozart Music PackardEinstein Physics WatsonEl Said History PainterGold Physics WatsonKatz Comp. Sci. TaylorCalifieri History PainterSingh Finance PainterCrick Biology WatsonBrandt Comp. Sci. TaylorKim Elec. Eng. Taylor


14/39

Anfromr1,r2, . . .,rmwhereP;

EachAirepresentsanattribute,andeachriarelation.Pis apredicate.If the

whereclause is omitted, the predicatePistrue.


15/39

Although the clauses must be written in the orderselect,from,where,theeasiest way to understand the operations specified by the query is to considerthe clauses in operational order: firstfrom,thenwhere,and thenselect.

1

Thefromclause by itself defines a Cartesian product of the relations listedin the clause. It is defined formally in terms of set theory,but is perhaps bestunderstood as an iterative process that generates tuples for the result relation ofthefromclause.

for eachtuplet1inrelationr1for eachtuplet2inrelationr2

. . .

for eachtupletminrelationrmConcatenatet1,t2, . . . ,tminto a single tupletAddtinto the result relation

The result relation has all attributes from all the relations in thefromclause.Since the same attribute name may appear in bothriandrj,as we saw earlier,we prefix the the name of the relation from which the attribute originally came,

before the attribute name.For example, the relation schema for the Cartesian product of relationsin-

structorandteachesis:

(instructAor.pIDa,ingstoructoPr.nDamFe,inEstrnuchtora.dneptcneamre,instructor.salary teaches.ID,teaches.course id,teaches.sec id,teaches.semester,teaches.year)

With this schema, we can distinguishinstructor.IDfromteaches.ID.For those at-tributes that appear in only one of the two schemas, we shall usually drop therelation-name prefix. This simplification does not lead to any ambiguity.We canthen write the relation schema as:

(instructor.ID,name,dept name,salaryteaches.ID,course id,sec id,semester,year)

To illustrate, consider theinstructorrelation in Figure 2.1 and theteachesrelation in Figure 2.7. Their Cartesian product is shown in Figure 3.6, whichincludes only a portion of the tuples that make up the Cartesian product result.

2

The Cartesian product by itself combines tuples frominstructorandteachesthat are unrelated to each other.Each tuple ininstructoris combined witheverytuple inteaches,even those that refer to a different instructor.The result can bean extremely large relation, and it rarely makes sense to create such a Cartesianproduct.

1In practice,SQLmay convert the expression into an equivalent form that can be processed more efficiently.However, we shall defer concerns about efficiency to Chapters 12 and 13.2Note that we renamedinstructor.IDasinst.IDto reduce the width of the table in Figure 3.6.


16/39

inst.ID name dept name salary teaches.ID course id sec id semesteryear

10101 Srinivasan Physics 95000 10101 CS-101 1 Fall 2009

10101 Srinivasan Physics 95000 10101CS-315 1 Spring 201010101 Srinivasan Physics 95000 10101 CS-347 1 Fall 2009

10101 Srinivasan Physics 95000 10101 FIN-201 1 Spring 201010101 Srinivasan Physics 95000 15151 MU-199 1 Spring 201010101 Srinivasan Physics 95000 22222 PHY-101 1 Fall 2009

... ... ... ... ... ... ... ......

... ... ... ... ... ... ... ......

12121 Wu Physics 95000 10101 CS-101 1 Fall 200912121 Wu Physics 95000 10101 CS-315 1 Spring 2010

12121 Wu Physics 95000 10101 CS-347 1 Fall 200912121 Wu Physics 95000 10101 FIN-201 1 Spring 201012121 Wu Physics 95000 15151 MU-199 1 Spring 201012121 Wu Physics 95000 22222 PHY-101 1 Fall 2009

... ... ... ... ... ... ... ......

... ... ... ... ... ... ... ......

15151 Mozart Physics 95000 10101 CS-101 1 Fall 200915151 Mozart Physics 95000 10101 CS-315 1 Spring 2010

15151 Mozart Physics 95000 10101 CS-347 1 Fall 2009FIN-201 1 Spring 2010

nMcUe-1r99 1 Spring 2010

... ... ... ... ... ... ... ... ...

... ... ... ... ... ... ... ... ...

22222 Einstein Physics 95000 10101 CS-101 1 Fall 200922222 Einstein Physics 95000 10101 CS-315 1 Spring 201022222 Einstein Physics 95000 10101 CS-347 1 Fall 200922222 Einstein Physics 95000 10101 FIN-201 1 Spring 2010

22222 Einstein Physics 95000 15151 MU-199 1 Spring 201022222 Einstein Physics 95000 22222 PHY-101 1 Fall 2009... ... ... ... ... ... ... ... ...

... ... ... ... ... ... ... ... ...

Figure 3.% $he Cartesian product of theinstructorrelation with theteachesrelation.

Instead, the predicate in thewhereclause is used to restrict thecombinations created by the Cartesian product to those that are meaningful forthe desired answer.We would expect a query involvinginstructorandteachesto

combine a particular tupletininstructorwith only those tuples inteachesthatrefer to thesame instructortowhichtrefers. Thatis, wewish onlytomatchteachestupleswithinstructortuples that have the sameIDvalue. The followingSQLquery ensures this condition, and outputs the instructor name and course

15151Mozart Physics 95000 1010115151Moza rtaPghoyPD95000n15


17/39

identifiers from such matching tuples.


18/39

selectname,courseidfrominstructor,teacheswhereinstructor.ID= teaches.ID;

Note that the above query outputs onlyinstructorswho have taught somecourse. Instructors who have not taught any course are not output; if we wish tooutput such tuples, we could use an operation called theouter join,which isdescribed in Section 4.1.2.If theinstructorrelation is as shown in Figure 2.1 and theteachesrelation is

as shown in Figure 2.7, then the relation that results from the preceding queryis shown in Figure 3.7. Observe that instructors Gold, Califieri, and Singh, whohave not taught any course,do not appear in the above result.If we only wished to find instructor names and course identifiers for instruc-

tors in the Computer Science department, we could add an extra predicate tothe

where clause, as shownbelow.

selectname,course idfrominstructor,teacheswhereinstructor.ID= teaches.IDandinstructor.dept name= Comp.Sci.;

Note that since thedept nameattribute occurs only in theinstructorrelation,wecould haveuseAdpjusatdgepot namPe,DinFsteadEonf

ihnstarunctocr.deeprt namein the abovequery.In general, the meaning of anSQLquery can be understood asfollows:

name courseid

Srinivasan CS-101Srinivasan CS-315Srinivasan CS-347Wu FIN-201Mozart MU-199Einstein PHY-101

El Said HIS-351Katz CS-101Katz CS-319Crick BIO-101Crick BIO-301Brandt CS-190Brandt CS-190Brandt CS-319Kim EE-181

Figure 3.& Result ofFor all instructors in the university who have taught some course# find

their names and the course I% of all courses they taught.


19/39


1.Generate a Cartesian product of the relations listed in thefromclause

2.Apply the predicates specified in thewhereclause on the result of Step 1.

3.For each tuple in the result of Step 2, output the attributes (or results ofexpressions) specified in theselectclause.

The above sequence of steps helps make clear what the result of anSQLqueryshould be,nothow it should be executed.A real implementation ofSQLwouldnot execute the query in this fashion; it would instead optimize evaluation bygenerating (as far as possible) only elements of the Cartesian product thatsatisfy thewhereclause predicates. We study such implementation techniqueslater,in Chapters 12 and 13.When writing queries, you should be careful to include appropriatewhere

clause conditions. If you omit thewhereclause condition in the precedingSQL

query,it would output the Cartesian product, which could be a huge relation.For the exampleinstructorrelation in Figure 2.1 and the exampleteachesrelationin Figure 2.7, their Cartesian product has 12 13= 156 tuples more thanwe can show in the text! To make matters worse, suppose we have a morerealistic number ofinstructorsthan we show in our sample relations in thefigures,say 200 instructors. Lets assume each instructor teaches 3 courses, so wehave 600 tuples in theteachesrelation.Then the above iterative processgenerates 200 600=120,000 tuplesAinpthae rgesoult. PDFEnhancer

3.3.3 The 'atura (oin

In our example query that combined information from theinstructorandteachestable, the matching conditionrequiredinstructor.IDto be equal toteaches.ID.These are the only attributes in the two relations that have the same name. Infact this is a common case; that is, the matching condition in thefromclausemost often requires all attributes with matching names to be equated.To make the life of anSQLprogrammer easier for this common case,SQL

supports an operation called thenatural join,which we describe below.In factSQLsupports several other ways in which information from two or more relations

can bejoinedtogether.We have already seen how a Cartesian product alongwith awhereclause predicate can be used to join information from multiplerelations.Other ways of joininginformation from multiple relationsarediscussedin Section 4.1.Thenatural joinoperation operates on two relations and produces a relation

as the result. Unlike the Cartesian product of two relations, which concatenateseach tupleof thefirst relation with every tupleof thesecond, natural joinconsidersonly those pairs of tuples with the same value on those attributes that appear in

the schemas of both relations. So, going back to the example of the relationsinstructorandteaches,computinginstructornatural jointeachesconsiders onlythose pairs of tuples where both the tuple frominstructorand the tuple fromteacheshave the same value on the common attribute,ID.


20/39

ID

10101

name

Srinivasan

dept name

Comp. Sci.

salary

65000

course id

CS-101

secid

1

semester

Fall

y

210101 Srinivasan Comp. Sci. 65000 CS-315 1 Spring 210101 Srinivasan Comp. Sci. 65000 CS-347 1 Fall 212121 Wu Finance 90000 FIN-201 1 Spring 215151 Mozart Music 40000 MU-199 1 Spring 222222 Einstein Physics 95000 PHY-101 1 Fall 232343 El Said History 60000 HIS-351 1 Spring 245565 Katz Comp. Sci. 75000 CS-101 1 Spring 245565 Katz Comp. Sci. 75000 CS-319 1 Spring 276766 Crick Biology 72000 BIO-101 1 Summer 276766 Crick Biology 72000 BIO-301 1 Summer 283821 Brandt Comp. Sci. 92000 CS-190 1 Spring 2

83821 Brandt Comp. Sci. 92000 CS-190 2 Spring 283821 Brandt Comp. Sci. 92000 CS-319 2 Spring 298345 Kim Elec. Eng. 80000 EE-181 1 Spring 2

Figure 3.) $he natural &oin of the instructorrelation with theteachesrelation.

The result relation,shown in Figure 3.8, has only 13 tuples, the ones thatgive informatiAonpaabogutoan iPnsDtruFctorEanndhaaconurcseethrat that instructor actually teaches. Notice that we do notrepeat those attributes that appear in the schemasof both relations; rather they appear only once. Notice also the order in whichthe attributes are listed: first the attributes common to the schemas of bothrelations, second thoseattributesunique to the schema of the firstrelation,andfinally,those attributes unique to the schema of the second relation.Consider the queryFor all instructors in the university who have taught

some course, find their names and the courseIDof all courses they taught,which we wrote earlier as:

selectname,courseidfrominstructor,teaches

whereinstructor.ID= teaches.ID;

This query can be written more concisely using the natural-join operation inSQLas:

selectname,courseidfrominstructornatural jointeaches;

Both of the above queries generate the same result.

As we sawearlier,the result of the natural joinoperationis arelation.Concep-tually,expressioninstructornatural jointeachesin thefromclause is replaced


21/39

by the relation obtained by evaluating the natural join.3Thewhereandselect

clauses are then evaluated on this relation, as we saw earlier in Section 3.3.2.Afromclause in anSQLquery can have multiple relations combined using

natural join, as shown here:

selectA1,A2, . . . ,Anfromr1natural joinr2natural join. ..natural join

rmwhereP;

More generally,afromclause can be of the form

fromE1,E2, ...,

En

where eachEican be a single relation or an expression involving natural joins.For example, suppose we wish to answer the query List the names ofinstructors along with the the titles of courses that they teach. The query can

be written inSQLas follows:

selectname,titlefrominstructornatural jointeaches,coursewhereteaches.courseid= course.courseid;

The natural joiAnpofainsgtruoctorPanDdFteachEesnishfirastncocmpeurted, as we saw earlier,and a Cartesian product of this resultwithcourseis computed,from which thewhereclause extracts only those tupleswhere the course identifier from the join result matches the course identifierfrom thecourserelation. Note thatteaches.course idin thewhereclause refers tothecourse idfield of the natural join result, since this field in turn came from theteachesrelation.In contrast the followingSQLquery doesnotcompute the same result:

selectname,titlefrominstructornatural jointeachesnatural joincourse;

To see why,note that the natural join ofinstructorandteachescontains theattributes (ID,name,dept name,salary,course id,sec id), while thecourserelationcontains the attributes (course id,title,dept name,credits). As a result, the natural

join of these two would require that thedept nameattribute values from the twoinputs be the same, in addition to requiring that thecourse idvalues be thesame. This query would then omit all (instructor name, course title) pairswhere the instructor teaches a course in a department other than theinstructors own department. The previous query,on the other hand, correctly outputssuch pairs.

3As a consequence, it is not possible to use attribute names containing the original relation names, for instanceinstruc-tor.nameorteaches.course id,to refer to attributes in the natural join result; we can, however,use attribute names such as


22/39

nameandcourse id,without the relation names.


23/39

To provide the benefit of natural join while avoiding the danger of equatingattributes erroneously,SQLprovides a form of the natural join construct thatallows you to specify exactly which columns should be equated. This feature isillustrated by the following query:

selectname,titlefrom(instructornatural jointeaches)joincourseusing(courseid);

The operationjoin. . .usingrequires a list of attribute names to be specified.Both inputs must have attributes with the specified names. Consider theoperationr1joinr2using(A1,A2). The operation is similar tor1natural joinr2,except that a pair of tuplest1fromr1andt2fromr2match ift1.A1= t2.A1andt1.A2= t2.A2;even ifr1andr2both have an attribute namedA3, it isnotrequired thatt1.A3= t2.A3.Thus, in the precedingSQLquery,thejoinconstruct permitsteaches.dept name

andcourse.dept nameto differ,and theSQLquery gives the correctanswer.

3." *++itiona Basic Operations

There are number of additional basic operations that are supported inSQL.

3.".1 The !eAnapmeaOgpoeratioPnDFEnhancerConsider again the query that we used earlier:

selectname,courseidfrominstructor,teacheswhereinstructor.ID= teaches.ID;

The result of this query is a relation with the following attributes:

name,courseid

The names of the attributes in the result are derived from the names of theattributes in the relations in thefromclause.

We cannot, however,always derive names in this way,for several reasons:First, two relations in thefromclause may have attributes with the same name,in which case an attribute name is duplicated in the result. Second, if we usedan arithmetic expression in theselectclause, the resultant attribute does nothave a name. Third, even if an attribute name can be derived from the base

relations as in the preceding example,we may want to change the attributename in the result. Hence,SQLprovides a way of renaming the attributes of aresult relation. It uses theasclause, taking the form:


24/39

old-nameasnew-name


25/39


Theasclause can appear in both theselectandfromclauses.4

Forexample,if we want the attribute namenameto bereplacedwith thename

instructor name,we can rewrite the preceding query as:

selectnameasinstructor name,courseidfrominstructor,teacheswhereinstructor.ID= teaches.ID;

Theasclause is particularly useful in renaming relations. One reason torename a relation is to replace a long relation name with a shortened versionthat is more convenient to use elsewhere in the query.To illustrate, we rewritethe queryFor all instructors in the university who have taught some course,find their names and the courseIDof all courses they taught.

selectT.name,S.courseidfrominstructorasT,teachesasSwhereT.ID= S.ID;

Another reason to rename a relation is a case where we wish to comparetuples in the same relation. We then need to take the Cartesian product of arelation with itself and, without renaming, it becomes impossible to distinguishone tuple from the other.Suppose that we want to write the queryFind thenames of all instructors whose salary is greater than at least one instructor in theBiology department.We can write theSQLexpression:

select distinctT.namefrominstructorasT,instructorasSwhereT.salary> S.salaryandS.dept name= Biology;

Observe that we could not use the notationinstructor.salary,since it would notbe clear which reference toinstructoris intended.

In the above query,TandScan be thought of as copies of the relationinstructor, but moreprecisely,they aredeclaredasaliases,that is asalternativenames,for the relationinstructor.An identifier,such asTandS,that is used torename a relation is referred to as acorrelation namein theSQLstandard,but

is also commonly referred to as atable alias,or acorrelation variable,oratuple variable.Note that abetterway tophrasetheprevious queryinEnglishwould be

Find the names of all instructors who earn more than the lowest paid instructorin the Biology department. Our original wording fits more closely with theSQLthatwe wrote, but the latter wording is more intuitive, and can in fact be expresseddirectly inSQLas we shall see in Section 3.8.2.

4Early versions ofSQLdid not include the keywordas.As a result, some implementations ofSQL,notably Oracle, donot permit thekeywordasin the from clause. In Oracle,old-nameasnew-name is written instead asold-name new-name in thefromclause. The keywordasis permitted for renaming attributes in theselectclause, but it is optionaland may be omitted in Oracle.


26/39

3.".2 String Operations

SQLspecifiesstrings byenclosingthem in singlequotes,forexample, Computer. Asingle quote characterthat is part of astringcan bespecifiedby using two

singlequote characters; for example,thestringIts right

canbespecified by

Itsright.TheSQLstandard specifies that the equality operation on strings is case sen-

sitive; as a result the expressioncomp. sci. = Comp. Sci.evaluates to false.However,some database systems, such asMySQLandSQL Server,do not distin-guish uppercase from lowercase when matching strings; as a resultcomp. sci.= Comp. Sci.would evaluate to true on these databases. This default behaviorcan,however,be changed, either at the database level or at the level of specificattributes.SQLalso permits a variety of functions on character strings, such as concate-

nating (using ), extracting substrings, finding the length of strings,

convertingstrings to uppercase (using the functionupper(s) wheresis a string) and low-ercase (using the functionlower(s)), removing spaces at the end of the string(usingtrim(s)) and so on. There are variations on the exact set of stringfunctionssupported by different database systems. See your database systems manual formore details on exactly what string functions it supports.Pattern matching can be performed on strings, using the operatorlike.We

describe patterns by using two special characters:

Percent (%A): pThae g%

ocharaPctDerFmatcEhens hanay

snubcsterinrg.Underscore ( ): The character matches any character.

Patterns are case sensitive; that is, uppercase characters do not match lowercasecharacters,or viceversa.Toillustrate pattern matching,weconsiderthe

following examples:

Intro% matches any string beginning withIntro.

%Comp% matches any string containingCompas a substring, for exam-ple,Intro. to Computer Science, and Computational Biology.

matches any string of exactly three characters.

% matches any string of at least three characters.

SQLexpresses patternsby using thelikecomparison operator.ConsiderthequeryFind the names of all departments whose building name includes the substringWatson. This query can be written


27/39

as:

selectdeptnamefromdepartmentwherebuildinglike%Watson%;


28/39

For patterns to include the special pattern characters (that is, % and ),SQLallows the specification of an escape character.The escape character is usedimmediately before a special pattern character to indicate that the specialpattern character is to be treated like a normal character.We define the

escape character for alikecomparison using theescapekeyword. To illustrate,consider the following patterns, which use a backslash(\)as the escapecharacter:

likeab\%cd%escape\matches all strings beginning withab%cd.

likeab\\cd%escape\matches all strings beginning withab\cd.

SQLallows us to search for mismatches instead of matches by using thenotlikecomparison operator.Some databases provide variants of thelikeoperation which do not distinguish lower and upper case.

SQL:1999also offers asimilar tooperation, which provides more powerfulpattern matching than thelikeoperation; the syntax for specifying patterns issimilar to that used in Unix regular expressions.

3.".3 *ttri,ute Specification in Seect -ause

The asterisk symbol * can be used in theselectclause to denoteall

attributes.Thus, the use ofinstructor.* in theselectclause of the query:

ApagoselePctDinFstrucEtorn.*h

ancerfrominstructor,teacheswhereinstructor.ID= teaches.ID;

indicates that all attributes ofinstructorare to be selected. Aselectclause of theformselect* indicates that all attributes of the result relation of thefromclauseare selected.

3."." Or+ering the Dispay of

TupesSQLoffers the user some control over the order in which tuples in a relationare displayed. Theorder byclause causes the tuples in the result of a query toappear in sorted order.To list in alphabetic order all instructors in the Physicsdepartment, we write:

selectnamefrominstructorwheredept name= Physicsorder byname;

By default, theorder byclause lists items in ascending order.To specify thesort order,we may specifydescfor descending order orascfor ascending order.Furthermore, ordering can be performed on multiple attributes. Suppose that


29/39

we wish to list the entireinstructorrelation in descending order ofsalary.Ifseveral


30/39

instructors have the same salary,we order them in ascending order by name.We express this query inSQLas follows:

select*frominstructororder bysalarydesc,nameasc;

3.".$ here -ause /re+icates

SQLincludes abetweencomparison operator to simplifywhereclauses thatspecify that a value be less than or equal to some value and greater than orequal to some other value. If we wish to find the names of instructors with

salary amounts between $90,000 and $100,000, we can use thebetweencomparison to write:

instead of:

selectnamefrominstructorwheresalarybetween90000and100000;

selectnamefrominstructorwheresalary= 90000;

Similarly,we can use thenot betweencomparison operator.We can extend the preceding query that finds instructor names along with

course identifiers, which we saw earlier,and consider a more complicated casein which we require also that the instructors be from the Biology department:Find the instructor names and the courses they taught for all instructors in theBiology department who have taught some course.To write this query,we canmodify either of theSQL

queries we saw earlier,by adding an extra condition inthewhereclause. We show below the modified form of theSQLquery that does

not use natural join.

selectname,course idfrominstructor,teacheswhereinstructor.ID= teaches.IDanddept name= Biology;

SQLpermits us to use the notation (v1,v2, . . .,vn) to denote a tuple of arityncontaining valuesv1,v2, . . . , vn.The comparison operators can be used on

tuples, and the ordering is defined lexicographically.For example,(a1,

a2)


31/39


courseid

CS-101CS-347PHY-101

Figure 3.0 $hec1relation# listing courses taught in Fall '""(.

is true ifa1


32/39

courseid

CS-101CS-315

CS-319CS-319FIN-201HIS-351MU-199

Figure 3.1 $hec2relation# listing courses taught in Spring '")".

3.$.1 The nion Operation

To find the set of all courses taught either in Fall 2009 or in Spring 2010, or both,

we write:6

(selectcourse idfromsectionwheresemester= Fallandyear= 2009)union(selectcourse id

ApawgheoresePmeDsteFr=

ESpnrinhgaanndcyeearr= 2010);

Theunionoperation automatically eliminates duplicates,unlike theselectclause. Thus, using thesectionrelation of Figure 2.6, where two sections ofCS-319 are offered in Spring 2010, and a section ofCS-101 is offered in the Fall 2009as well as in the Fall 2010 semester,CS-101 andCS-319 appear only once in theresult, shown in Figure 3.11.If we want to retain all duplicates, we must writeunion allin place ofunion:

(selectcourse idfromsection

wheresemester= Fallandyear= 2009)union all(selectcourse idfromsectionwheresemester= Springandyear= 2010);

The number of duplicate tuples in the result is equal to the total number ofduplicates that appear in bothc1andc2.So, in the above query,each ofCS-319andCS-101 would be listed twice.As a further example,if it were the case that 4sections ofECE-101 were taught in the Fall 2009semesterand 2 sections ofECE-101

6The parentheses we include around eachselect-from-wherestatement are optional, but useful for ease of reading.

fromsection


33/39

courseid

CS-101CS-315

CS-319CS-347FIN-201HIS-351MU-199PHY-101

Figure 3.11 $he result relation forc1unionc2.

were taught in the Fall 2010 semester,then there would be 6 tuples withECE-101in the result.

3.$.2 The ntersect Operation

To find the set of all courses taught in the Fall 2009 as well as in Spring 2010 wewrite:

(selectcourse id

fromsectionwheresemester= Fallandyear= 2009)intersect(selectcourse idfromsectionwheresemester= Springandyear= 2010);

The result relation, shown in Figure 3.12, contains only one tuple withCS-101.Theintersectoperation automatically eliminates duplicates. For example, if itwere the case that 4 sections ofECE-101 were taught in the Fall 2009 semester

and 2 sections ofECE-101 were taught in the Spring 2010 semester,then therewould be only 1 tuple withECE-101 in the result.If we want to retain all duplicates, we must writeintersect allin place of

intersect:

courseid

CS-101

Figure 3.12 $he result relation for c1intersectc2.


34/39

(selectcourse idfromsectionwheresemester= Fallandyear= 2009)intersect all

(selectcourse idfromsectionwheresemester= Springandyear= 2010);

The number of duplicate tuples that appear in the result is equal to theminimum number of duplicates in bothc1andc2.For example, if 4 sections ofECE-101 were taught in the Fall 2009 semester and 2 sections ofECE-101 weretaught in the Spring 2010 semester,then there would be 2 tuples withECE-101in the result.

3.$.3 The 45cept Operation

To find all courses taught in the Fall 2009 semester but not in the Spring 2010semester,we write:

(selectcourse idfromsectionwheresemester= Fallandyear= 2009)except

Apa(segleoctcouPrsDe iFd

fromsection

Enhancer

wheresemester= Springandyear= 2010);

The result of this query is shown in Figure 3.13. Note that this is exactly relationc1of Figure 3.9 except that the tuple forCS-101 does not appear. Theexceptoperation

7outputs all tuples from its first input that do not occur in the second

input; that is, it performs set difference.The operation automatically eliminatesduplicatesin theinputs before performing set difference.For example,if 4sections ofECE-101 were taught in the Fall 2009 semester and 2 sections ofECE-101 were taught in the Spring 2010 semester,the result of theexceptoperationwould not have any copy ofECE-101.

If we want to retain duplicates, we must writeexcept allin place ofexcept:

(selectcourse idfromsectionwheresemester= Fallandyear= 2009)except all(selectcourse idfromsectionwheresemester= Springandyear= 2010);

7SomeSQLimplementations, notably Oracle, use the keywordminusin place ofexcept.


35/39


courseid

CS-347PHY-101

Figure 3.13 $he result relation forc1e*cept c2.

The number of duplicate copies of a tuple in the result is equal to the number ofduplicate copies inc1minus the number of duplicate copies inc2, provided thatthe difference is positive. Thus, if 4 sections ofECE-101 were taught in the Fall2009semesterand 2 sections ofECE-101 were taught in Spring 2010, then thereare2 tuples withECE-101 in the result. If, however,there were two or fewer sectionsofECE-101 in the the Fall 2009 semester,and two sections ofECE-101 in theSpring2010 semester,there is no tuple withECE-101 in theresult.

3.% 'u 6aues

Null values present special problems in relational operations, including arith-metic operations, comparison operations, and set operations.The result of an arithmetic expression (involving, for example+,,,or/)is

null if any of tAhepinapugt ovaluePs Dis Fnull.EFonr

hexamplce,eifra query has an expressionr.A+ 5, andr.Ais null for a particular tuple, then the expression result mustalsobe null for that tuple.

Comparisons involving nulls are more of a problem. For example, considerthe comparison1< null.It would be wrong to say this is true since we do notknow what the null valuerepresents.But it would likewise be wrong to claimthis

expression is false; if we did,not(1< null) would evaluate to true, whichdoesnot make sense.SQLtherefore treats asunknownthe result of any comparisoninvolving anullvalue (other than predicatesis nullandis not null,which aredescribed later in this section). This creates a third logical value in addition totrueandfalse.Since the predicate in awhereclause can involve Boolean operations such

asand,or,andnoton the results of comparisons, the definitions of the Booleanoperations are extended to deal with the valueunknown.

and: The result oftrueandunknownisunknown,falseandunknownisfalse,whileunknownandunknownisunknown.


36/39


or: The result oftrueorunknownistrue,falseorunknownisunknown,whileunknownorunknownisunknown.

not: The result ofnotunknownisunknown.

You can verify that ifr.

Ais null, then1

pengantar DBMS

Documents

Transcript of pengantar DBMS