UNIT-V CLIENT SERVER AND INTERNET - Vidyarthiplus

61
Know more @ www.vidyarthiplus.com UNIT-V CLIENT SERVER AND INTERNET

Transcript of UNIT-V CLIENT SERVER AND INTERNET - Vidyarthiplus

Know more @ www.vidyarthiplus.com

UNIT-V

CLIENT SERVER AND INTERNET

Know more @ www.vidyarthiplus.com

CONTENTS

5.1 Introduction about Internet

5.1.1 WWW

5.1.2 History of Web

5.1.3 Hypertext

5.2 Web Client/Server

5.2.1 Web Client /Server Topology

5.2.2 Languages of Web

5.2.3 HTTP

5.2.4 HTML

5.3 3-Tiers Client/Server Web Style

5.3.1 Introduction

5.3.2 3-Tier TP monitors

5.3.3 3-Tier Applications

5.4 CGI (Common Gateway Interface)

5.5 Server Side of Web

5.5.1 History

5.5.2 Explanation

5.6 CGI and State

5.6.1 Introduction of CGI

5.6.2 Applications

5.6.3 Forms

Know more @ www.vidyarthiplus.com

5.6.4 Gateways

5.6.5 Virtual Documents

5.7 SQL Database Server

5.7.1 History

5.7.2 SQL Server 2005

5.7.3 SQL Server 2008

5.7.4 SQL Server 2008 R2

5.7.5 SQL Server 2012

5.7.6 Architecture

5.8 Middleware and Federated Database

5.8.1 Technology

5.8.2 Characteristics of Federated Solution

5.8.3 Architecture

5.9 Query Processing

5.10 Data Warehouse Concepts

5.10.1 Characteristics of data warehouses

5.10.2 Distributed data warehouses

5.10.3 Architecture

5.10.4 Data warehouse (PDW parallel)

5.11 EIS/DSS

5.11.1 EIS (environment impact statement)

5.11.2 DSS (decision support system)

5.12 Data mining

Know more @ www.vidyarthiplus.com

5.12.1 Overview

5.12.2 Data, information & knowledge

5.12.3 What can data mining do?

5.12.4 How does data mining work?

5.13 Groupware server

5.13.1 Definition

5.13.2 Features of groupware server

5.13.3 How groupware works?

5.13.4 Groupware in action

5.14 Question Bank

Know more @ www.vidyarthiplus.com

TECHNICAL TERMS

1. World Wide Web

The WWW project has the potential to do for the Internet what Graphical User

Interfaces (GUIs) have done for personal computers -- make the Net useful to end users.

2. Hypertext

Hypertext provides the links between different documents and different document types.

If you have used Microsoft Windows Win Help system or the Macintosh HyperCard

application, you likely know how to use hypertext. In a hypertext document, links from

one place in the document to another are included with the text.

3. Uniform Resource Locators (URLs)

URLs provide the hypertext links between one document and another. These links can

access a variety of protocols (e.g., ftp, gopher, or http) on different machines (or your

own machine).

4. Common Gateway Interfaces (CGI)

Servers use the CGI interface to execute local programs. CGIs provide a gateway

between the HTTP server software and the host machine.

5. Hypertext Markup Language (HTML)

Know more @ www.vidyarthiplus.com

In a markup language, the text is mixed with the marks that indicate how formatting is

to take place. For example, Lynx and Mosaic do not insert a blank line before

unnumbered user lists, but Netscape does.

6. Forms

One of the most prominent uses of CGI is in processing forms. Forms are a subset of

HTML that allows the user to supply information. The forms interface makes Web

browsing an interactive process for the user and the provider and it shows a simple

form.

7. Gateways

Web gateways are programs or scripts used to access information that is not directly

readable by the client. CGI provides a solution to the problem in the form of a gateway.

8. Communication threads are used to handle parts of the communication between the

applications and the database server.

9. Request threads perform the SQL operations requested by the applications. When the

Database Server is requested to perform a SQL operation it allocates one of its Request threads

to perform the task.

Know more @ www.vidyarthiplus.com

CLIENT/SERVER AND INTERNET

5.1 Introduction about Internet

The WWW is a new way of viewing information -- and a rather different one. If, for

example, you are viewing this paper as a WWW document, you will view it with a browser, in

which case you can immediately access hypertext links. If you are reading this on paper, you will

see the links indicated in parentheses and in a different font. Keep in mind that the WWW is

constantly evolving. We have tried to pick stable links, but sites reorganize and sometimes they

even move. By the time you read the printed version of this paper, some WWW links may have

changed.

5.1.1 World Wide Web

The WWW project has the potential to do for the Internet what Graphical User Interfaces

(GUIs) have done for personal computers -- make the Net useful to end users. The Internet

contains vast resources in many fields of study (not just in computer and technical

information). In the past, finding and using these resources has been difficult.

The Web provides consistency: Servers provide information in a consistent way and clients

show information in a consistent way. To add a further thread of consistency, many users

view the Web through graphical browsers which are like other windows (Microsoft

Windows, Macintosh windows, or X-Windows) applications that they use.

Know more @ www.vidyarthiplus.com

A principal feature of the Web is its links between one document and another. These links,

described in the section on hypertext, allow you to move from one document to another.

Hypertext links can point to any server connected to the Internet and to any type of file.

These links are what transform the Internet into a web.

5.1.2 History of the Web

The Web project was started by Tim Berners-Lee at the European Particle Physics Laboratory

(CERN) in Geneva, Switzerland. Tim wanted to find a way for scientists doing projects at CERN

to collaborate with each other on-line. He thought of hypertext as one possible method for this

collaboration.

Tim started the WWW project at CERN in March 1989. In January 1992, the first

versions of WWW software, known as Hypertext Transfer Protocol (HTTP), appeared on

the Internet.

By October 1993, 500 known HTTP servers were active.

When Robelle joined the Internet in June 1994, we were about the 80,000th registered

HTTP server.

By the end of 1994, it was estimated that there were over 500,000 HTTP servers.

Attempts to keep track of the number of HTTP servers on the Internet have not been

successful. Programs that try to automatically count HTTP servers never stop -- new

servers are being added constantly.

5.1.3 Hypertext

Hypertext provides the links between different documents and different document types.

If you have used Microsoft Windows Win Help system or the Macintosh HyperCard application,

you likely know how to use hypertext.

In a hypertext document, links from one place in the document to another are included

with the text. By selecting a link, you are able to jump immediately to another part of the

document or even to a different document. In the WWW, links can go not only from one

document to another, but from one computer to another.

Know more @ www.vidyarthiplus.com

5.2 Web Client/server

Client/server describes the relationship between two computer programs in which one

program, the client, makes a service request from another program, the server, which fulfills the

request. Although the client/server idea can be used by programs within a single computer, it is a

more important idea in a network. In a network, the client/server model provides a convenient

way to interconnect programs that are distributed efficiently across different locations. Computer

transactions using the client/server model are very common.

For example, to check your bank account from your computer,

A client program in your computer forwards your request to a server program at

the bank.

That program may in turn forward the request to its own client program that sends

a request to a database server at another bank computer to retrieve your account

balance.

The balance is returned back to the bank data client, which in turn serves it back

to the client in your personal computer, which displays the information for you.

The client/server model has become one of the central ideas of network computing. Most

business applications being written today use the client/server model. So does the Internet's main

program, TCP/IP. In marketing, the term has been used to distinguish distributed computing by

smaller dispersed computers from the "monolithic" centralized computing of mainframe

computers. But this distinction has largely disappeared as mainframes and their applications have

also turned to the client/server model and become part of network computing.

In the usual client/server model, one server, sometimes called a daemon, is activated and awaits

client requests. Typically, multiple client programs share the services of a common server

program. Both client programs and server programs are often part of a larger program or

application.

Know more @ www.vidyarthiplus.com

Relative to the Internet, your Web browser is a client program that requests services (the

sending of Web pages or files) from a Web server (which technically is called a Hypertext

Transport Protocol or HTTP server) in another computer somewhere on the Internet.

Similarly, your computer with TCP/IP installed allows you to make client requests for

files from File Transfer Protocol (FTP) servers in other computers on the Internet.

Other program relationship models included master/slave, with one program being in

charge of all other programs, and peer-to-peer, with either of two programs able to initiate a

transaction.

5.2.1 Web Client-Server Topology

The Web Client-Server installation topology enables PR-Tracker clients to connect to the

PR-Tracker server over the Internet or an intranet. It is the recommended installation topology

when PR-Tracker users are working remotely or are in a network domain that is not the that

same domain as the PR-Trace

Ker Server. It may also be used when the server's Firewall software blocks network

communication.

Clients connect to the PR-Tracker Web service by specifying the address of the

prtracker.asmx file in the PR-Tracker.

Know more @ www.vidyarthiplus.com

Figure 5.1 Web client-server installation topology

The diagram above shows a Web Client-Server installation topology where the PR-Tracker

Server hosts the PR-Tracker Web Service. To use this configuration option, a virtual directory

must be created in IIS to host the PR-Tracker Web Service. The PR-Tracker Server configuration

wizard can do this for you. By default, PR-Tracker configures the virtual directory so that it can

be accessed anonymously. If you want additional security on this virtual directory, you must add

it manually.

An alternate Web Client-Server installation topology is depicted below.

In this topology the PR-Tracker Web Service runs on a PR-Tracker Client instead of the

server. This topology is preferred when you don't want to store the PR-Tracker database on a

corporate web server for security or performance reasons.

To implement this installation topology, you will need to create a virtual directory to host

the PR-Tracker Web Service manually. You will also need to start PR-Tracker on the Web server

at least once and connect to the PR-Tracker Server in client-server mode. This step enables the

information the PR-Tracker Web Service needs to connect to the PR-Tracker Server Service to

be loaded into the Settings.xml file.

5.2.2The language of the web

Know more @ www.vidyarthiplus.com

In order to use the WWW, you must know something about the language used to communicate

in the Web. There are three main components to this language:

Uniform Resource Locators (URLs)

o URLs provide the hypertext links between one document and another. These links

can access a variety of protocols (e.g., ftp, gopher, or http) on different machines

(or your own machine).

Hypertext Markup Language (HTML)

o WWW documents contain a mixture of directives (markup), and text or graphics.

The markup directives do such things as make a word appear in bold type. This is

similar to the way UNIX users write nroff or troff documents, and MPE users

write with Galley, TDP, or Prose. For PC users, this is completely different from

WYSIWYG editing. However, a number of tools are now available on the market

that hides the actual HTML.

Common Gateway Interfaces (CGI)

o Servers use the CGI interface to execute local programs. CGIs provide a gateway

between the HTTP server software and the host machine.

5.2.3 Hypertext Transfer Protocol

When you use a WWW client, it communicates with a WWW server using the Hypertext

Transfer Protocol. When you select a WWW link, the following things happen:

The client looks up the hostname and makes a connection with the WWW server.

The HTTP software on the server responds to the client's request.

The client and the server close the connection.

Know more @ www.vidyarthiplus.com

Compare this with traditional terminal/host computing. Users usually logon (connect) to the

server and remain connected until they logoff (disconnect). An HTTP connection, on the other

hand, is made only for as long as it takes for the server to respond to a request. Once the request

is completed, the client and the server are no longer in communication.

WWW clients use the same technique for other protocols. For example, if you request a directory

at an, the WWW client makes an FTP connection, logs on as an anonymous user, switches to the

directory, requests the directory contents, and then logs off the FTP server.

5.2.4 Hypertext Mark up Language (HTML)

When you write documents for WWW, you use the Hypertext Markup Language (HTML). In a

markup language, you mix your text with the marks that indicate how formatting is to take place.

Most WWW browsers have an option to "View Source" that will show you the HTML for the

current document that you are viewing. Each WWW browser renders HTML in its own way.

Character-mode browsers use terminal highlights (e.g., inverse video, dim, or

underline) to show links, bold, italics, and so on.

Graphical browsers use different typefaces, colors, and bold and italic formats to

display different HTML marks. Writers have to remember that each browser in

effect has its own HTML style sheet. For example, Lynx and Mosaic do not insert

a blank line before unnumbered user lists, but Netscape does.

If you want to see how your browser handles standard and non-standard HTML, try the

WWW Test Pattern. The test pattern will show differences between your browser, standard

HTML, and other browsers.

Creating HTML

Creating HTML is awkward, but not that difficult. The most common method of creating

HTML is to write the raw mark-up language using a standard text editor. If you are creating

Know more @ www.vidyarthiplus.com

HTML yourself, we have found the chapter authoring for the Web in the O'Reilly book

"Managing Internet Information Services" to be an excellent resource.

Bob Green, founder of Robelle, finds to be useful for learning HTML. Instead of hiding

the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these

into a text window. To see how your documents look, you must use a separate Web browser.

Microsoft has produced a new add-on to Microsoft Word that produces HTML is

available from Microsoft at no charge. You will need to know the basic concepts of Microsoft

Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word

users, we found that the Internet Assistant didn't help us much.

5.3 3-TIER CLIENT SERVER WEB STYLE

A special type of client/server architecture consisting of three well-defined and separate

processes, each running on a different platform:

The user interface, which runs on the user's computer (the client).

The functional modules that actually process data. This middle tier runs on a server and is

often called the application server.

A database management system (DBMS) that stores the data required by the middle tier.

This tier runs on a second server called the database server.

The three-tier design has many advantages over traditional two-tier or single-tier designs, the

chief ones being:

The added modularity makes it easier to modify or replace one tier without

affecting the other tiers.

Separating the application functions from the database functions makes it easier to

implement load balancing.

5.3.1 Introduction to 3-Tier Architecture

Know more @ www.vidyarthiplus.com

In 3-tier architecture, there is an intermediary level, meaning the architecture is generally split up

between:

A client, i.e. the computer, which requests the resources, equipped with a user interface

(usually a web browser ) for presentation purposes

The application server (also called middleware), whose task it is to provide the requested

resources, but by calling on another server

The data server, which provides the application server with the data it requires.

Three Tier Architecture

Figure 5.2-Tier Architecture

To overcome the limitations of Two-Tier Architecture

Middle tier between UI and DB

Ways of incorporating Middle-Tier

Transaction processing Monitors

5.3.2 3-Tier Tp Monitor

Online access through

Know more @ www.vidyarthiplus.com

Time sharing or Transaction Processing

Client connects to TP instead of DB

Monitor accepts transaction, queues it and takes responsibility until it is completed

Asynchrony is achieved

Key services provided by the monitor

ability to update multiple different DBMS in a single transaction

connectivity to a variety of data sources, including

flat files

non relational DBMS

mainframe

more scalable than a 2-tier approach

ability to attach priorities to transactions

robust security

For large (e.g., 1,000 user) applications, a TP monitor is one of the most effective

solutions.

The three-tier design has many advantages over traditional two-tier or single-tier designs,

the chief ones being:

Separating the application functions from the database functions makes it easier to

implement

5.3.3 3 Tier Applications

Know more @ www.vidyarthiplus.com

rs.

Most of Application’s business logic is moved to Shared host server

PC is used only for presentation services

Approach is similar to X Architecture

Both aim at pulling the main body of application logic off the desktop and running it on a

shared host.

5.4 CGI: Common Gateway Interface

An HTTP server is often used as a gateway to a legacy information system; for example, an

existing body of documents or an existing database application. The Common Gateway Interface

is an agreement between HTTP server implementers about how to integrate such gateway scripts

and programs.

Know more @ www.vidyarthiplus.com

It is typically used in conjunction with HTML forms to build database applications.

How is a form’s data passed to a program that hangs off an HTTP server? It gets passed

using an end-to-end client/server protocol that includes both HTTP and CGI. The So best way to

explain the dynamics of the protocol is to walk you through a POST method invocation.

A CGI:

How the client and server programs play together to process a form’s request. Here’s the step-by-

step explanation of this interaction:

1. User clicks on the form’s “submit” button.

This causes the Web browser to collect the data within the form, and then assemble it into

one long string of name/value pairs each separated by an ampersand (&). The browser translates

spaces within the data into plus (+) symbols. No, it’s not very pretty.

2. The Web Browser invokes a POST HTTP method.

This is an ordinary HTTP request that specifies a POST method, the URL of the target

program in the “cgi-bin” dictionary, and the typical HTTP headers. The message body-HTTP

calls it the “entity”- contains the form’s data. This is the string: name=value &name=value&...

3. The HTTP server receives the method invocation via a socket connection.

The server parses the message and discovers that it’s a POST for the “cgi-bin” program.

So it starts a CGI interaction.

4. The HTTP server sets up the environment variables.

The CGI protocol uses environment variables as a shared bulletin board for

communicating information between the HTTP server and the CGI program. The server typically

provides the following environmental information: server_name, request_method, path_info,

script_name, content_type, and content-length.

5. The HTTP server CGI program starts a.

The HTTP server executes an instance of the CGI program specified in the URL;it’s

typically in the “cgi=bin” directory.

6. The CGI program reads the environment variables.

Know more @ www.vidyarthiplus.com

In this case, the program discovers by reading the environment variables that it is

responding to a POST.

7. The CGI program receives the message body via the standard input pipe (stdin).

Remember, the message body contains the famous string of name=value items separated

by ampersands (&). The content length environment variable tells the program how much data is

in the string. The CGI program parses the string contents to retrieve the form data. It uses the

content length environment variable to determine how many characters to read in from the

standard input pipe. Cheer up, we’re half way there.

8. The CGI program does some work.

Typically, a CGI program interacts with some back-end resource-like a DBMS or

transaction program-to service the other acceptable MIME type. This information goes into the

HTTP response to provide all the information that goes into the HTTP response headers. The

HTTP server will then send the reply “as is” to the client. Why would you do this? Because it

removes the extra overhead of having the HTTP server parse the output to create the response

headers. Programs whose name begins with “nph-” indicate that they do not require HTTP server

assistance; CGI calls them nonparsed header programs (nph).

9. The CGI program returns the results via the standard output pipe (stdout).

The program pipes back the results to the HTTP server via its standard output. The HTTP

server receives the results on its standard input. This concludes the CGI interaction.

10. The HTTP server returns the results to the Web browser.

The HTTP server can either append some response headers to the information it receives

from the CGI program, or it sends it “as is” if it’s an nph program.

As you can see, a CGI program is executed in real time; it gets the information and then

builds a dynamic Web page to satisfy a client’s request. CGI makes the Web more dynamic. In

contrast, a plain HTML document is static, which means the text file does not change. CGI may

be clumsy, but it does allow us to interface Web clients to general-purpose back-end services-

such as Amazon.com-as well as to Internet search utilities such as Yahoo! And Excite. You can

even stretch CGI to its limits to create general-purpose client/server programs like the Federal

Express package-tracking Web page. However, Federal Express uses CGI to connect to a TP

Monitor in the backend.

Know more @ www.vidyarthiplus.com

5.5 THE SERVER SIDE OF THE WEB

Server-side scripting is a web server technology in which a user's (client's) request is

handled by a script running on the web server to generate dynamic web pages. It is usually used

to provide interactive web sites that interface to databases or other data stores. This is different

from client-side scripting where scripts, usually JavaScript, are run in the web browser.

Server-side scripting is used to customize the server response based on the user's

requirements, access rights, or queries into data stores. From a security point of view, the source

code of server-side scripts are never visible to the browser as these scripts are executed on the

server and emit HTML corresponding to the user's input to the page.

When the server serves data in a commonly used manner, for example according to the

HTTP or FTP protocols, users may have their choice of a number of client programs (most

modern web browsers can request and receive data using both of those protocols). In the case of

more specialized applications, programmers may write their own server, client, and

communications protocol that can only be used with one another.

Programs that run on a user's local computer without ever sending or receiving data over a

network are not considered clients, and so the operations of such programs would not be

considered client-side operations.

5.5.1 History

Server-side scripting was invented in early 1995 by Fred DuFresne while developing the first

web site for Boston, MA television station WCVB. The technology is described in US patent

5835712. The patent was issued in 1998 and is now owned by Open Invention Network (OIN).

In 2010 OIN named Fred DuFresne a "Distinguished Inventor" for his work on server-side

scripting.

Know more @ www.vidyarthiplus.com

5.5.2 Explanation

In the earlier days of the web, server-side scripting was almost exclusively performed by

using a combination of C programs, Perl scripts and shell scripts using the Common Gateway

Interface (CGI). Those scripts were executed by the operating system, mnemonic coding and the

results simply served back by the web server. These and other on-line scripting languages such

as ASP and PHP can often be executed directly by the web server itself or by extension modules

(e.g. mod_perl or mod php) to the web server.

WebDNA includes its own embedded database system. Either form of scripting (i.e., CGI

or direct execution) can be used to build up complex multi-page sites, but direct execution

usually results in lower overhead due to the lack of calls to external interpreters.

Dynamic websites are also sometimes powered by custom web application servers, for

example the Python "Base HTTP Server" library, although some may not consider this to be

server-side scripting. When working with dynamic Web-based scripting technologies, like

classic ASP or PHP, developers must have a keen understanding of the logical, temporal, and

physical separation between the client and the server.

5.6 CGI AND STATE

5.6.1 Introduction

A CGI program is any program designed to accept and return data that conforms to the

CGI specification. The program could be written in any programming language, including C,

Perl, Java, or Visual Basic.

CGI programs are the most common way for Web servers to interact dynamically with

users. Many HTML pages that contain forms, for example, use a CGI program to process the

form's data once it's submitted. Another increasingly common way to provide dynamic feedback

for Web users is to include scripts or programs that run on the user's machine rather than the

Web server. These programs can be Java applets, Java scripts, or ActiveX controls. These

Know more @ www.vidyarthiplus.com

technologies are known collectively as client-side solutions, while the use of CGI is a server-side

solution because the processing occurs on the Web server

One problem with CGI is that each time a CGI script is executed, a new process is started. For

busy Web sites, this can slow down the server noticeably. A more efficient solution, but one that

it is also more difficult to implement, is to use the server's API, such as ISAPI or NSAPI.

Another increasingly popular solution is to use Java servlets.

DIAGRAM OF CGI

Figure 5.4 CGI

5.6.2 CGI Applications

CGI turns the Web from a simple collection of static hypermedia documents into a whole

new interactive medium, in which users can ask questions and run applications. Let's take a look

at some of the possible applications that can be designed using CGI.

5.6.3 Forms

One of the most prominent uses of CGI is in processing forms. Forms are a subset of

HTML that allows the user to supply information. The forms interface makes Web browsing an

interactive process for the user and the provider shows a simple form.

Know more @ www.vidyarthiplus.com

As can be seen from the figure, a number of graphical widgets are available for form

creation, such as radio buttons, text fields, checkboxes, and selection lists. When the form is

completed by the user, the Submit Order! button is used to send the information to the server,

which executes the program associated with the particular form to "decode" the data.

Figure 5.5 Simple form illustrating different widgets

Generally, forms are used for two main purposes. At their simplest, forms can be used to collect

information from the user. But they can also be used in a more complex manner to provide back-

and-forth interaction. For example, the user can be presented with a form listing the various

documents available on the server, as well as an option to search for particular information

Know more @ www.vidyarthiplus.com

within these documents. A CGI program can process this information and return document(s)

that match the user's selection criteria.

5.6.4 Gateways

Web gateways are programs or scripts used to access information that is not directly

readable by the client.

CGI provides a solution to the problem in the form of a gateway. You can use a language

such as oraperl (see Chapter 9, Gateways, Databases, and Search/Index Utilities, for more

information) or a DBI extension to Perl to form SQL queries to read the information contained

within the database. Once you have the information, you can format and send it to the client. In

this case, the CGI program serves as a gateway to the Oracle database, as shown in Figure 1.3

Figure 5.6 A gateway to a database

Similarly, you can write gateway programs to any other Internet information service,

including Archie, WAIS, and NNTP (Usenet News), shows examples of interacting with other

Internet services. In addition, you can amplify the power of gateways by using the forms

Know more @ www.vidyarthiplus.com

interface to request a query or search string from the user to retrieve and display dynamic, or

virtual, information. We will discuss these special documents next.

5.6.5 Virtual Documents

Virtual, or dynamic, document creation is at the heart of CGI. Virtual documents are created

on the fly in response to a user's information request. You can create virtual HTML, plain text,

image, and even audio documents. A simple example of a virtual document could be something

as trivial as this:

Welcome to Shishir's WWW Server!

You are visiting from diamond.com. The load average on this machine is 1.25.

Happy navigating!

In this example, there are two pieces of dynamic information: the alphanumeric address (IP

name) of the remote user and the load average on the serving machine. This is a very simple

example, indeed!

On the other hand, very complex virtual documents can be created by writing programs

that use a combination of graphics libraries, gateways, and forms. As a more sophisticated

example, say you are the manager of an art gallery that specializes in selling replicas of ancient

Renaissance paintings and you are interested in presenting images of these masterpieces on the

Web. You start out by creating a form that asks for user information for the purpose of

promotional mailings, presents a search field for the user to enter the name of a painting, as well

as a selection list containing popular paintings.

Once the user submits the form to the server, a program can email the user information to

a certain address, or store it in a file. And depending on the user's selection, either a message

stating that the painting does not exist or an image of the painting can be displayed along with

some historical information located elsewhere on the Internet.

Know more @ www.vidyarthiplus.com

Along with the picture and history, another form with several image processing options to

modify the brightness, contrast, and/or size of the picture can be displayed. You can write

another CGI program to modify the image properties on the fly using certain graphics libraries,

such as god, sending the resultant picture to the client.

This is an example of a more complex CGI program using many aspects of CGI programming.

Several such examples will be presented in this book.

5.7 SQL Database Server

SQL Server is a relational database management system from that’s designed for the

environment. SQL Server runs on (Transact -), a set of programming s from and Microsoft that

add several features to standard SQL, including transaction control, exception and , row

processing, and declared s.

Code named in development, SQL Server 2005 was released in November 2005. The

2005 product is said to provide enhanced flexibility,, reliability, and to applications, and to make

them easier to create and deploy, thus reducing the complexity and tedium involved in . SQL

Server 2005 also includes more administrative support.

The original SQL Server code was developed by Sybase; in the late 1980s, Microsoft,

Sybase and Ashton-Tate collaborated to produce the first version of the product, SQL Server 4.2

for . Subsequently, both Sybase and offered SQL Server products. Sybase has since renamed

their product

5.7.1 History

Prior to version 7.0 the for MS SQL Server was sold by to Microsoft, and was

Microsoft's entry to the enterprise-level database market, competing against

Oracle, IBM, and, later.

Know more @ www.vidyarthiplus.com

Microsoft, Sybase and originally teamed up to create and market the first version

named SQL Server 1.0 for (about 1989) which was essentially the same as

Sybase SQL Server 3.0 on , etc.

Microsoft SQL Server 4.2 was shipped around 1992 (available bundled with IBM

version 1.3). Later Microsoft SQL Server 4.21 for Windows NT was released at

the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was the first

version designed for NT, and did not include any direction from Sybase.

About the time Windows NT was released, Sybase and Microsoft parted ways and

each pursued its own design and marketing schemes. Microsoft negotiated

exclusive rights to all versions of SQL Server written for Microsoft operating

systems.

Later, Sybase changed the name of its product to Adaptive Server Enterprise to

avoid confusion with Microsoft SQL Server. Until 1994, Microsoft's SQL Server

carried three Sybase copyright notices as an indication of its origin.

SQL Server 7.0 and SQL Server 2000 included modifications and extensions to

the Sybase code base, adding support for the IA-64 architecture. By SQL Server

2005 the legacy Sybase code had been completely rewritten.

In the ten years since release of Microsoft's previous SQL Server product (SQL

Server 2000), advancements have been made in performance, the client IDE tools,

and several complementary systems that are packaged with SQL Server 2005.

5.7.2 SQL Server 2005

SQL Server 2005 (codename Yukon) was released in October 2005. It included native

support for managing XML data, in addition to relational data. For this purpose, it defined an

xml data type that could be used either as a data type in database columns or as literals in

queries. XML columns can be associated with XSD schemas such as;

XML data being stored is verified against the schema. XML is converted to an internal

binary data type before being stored in the database. Specialized indexing methods

were made available for XML data.

Know more @ www.vidyarthiplus.com

XML data is queried using XQuery; SQL Server 2005 added some extensions to the

T-SQL language to allow embedding XQuery queries in T-SQL.

In addition, it also defines a new extension to XQuery, called XML DML that allows

query-based modifications to XML data.

SQL Server 2005 also allows a database server to be exposed over web services using

Tabular Data Stream (TDS) packets encapsulated within SOAP (protocol) requests.

When the data is accessed over web services, results are returned as XML.

Common Language Runtime (CLR) integration was introduced with this version,

enabling one to write SQL code as Managed Code by the CLR.

For relational data, T-SQL has been augmented with error handling features (try/catch)

and support for recursive queries with CTEs (Common Table Expressions).

SQL Server 2005 has also been enhanced with new indexing algorithms, syntax and

better error recovery systems. Data pages are check summed for better error resiliency, and

optimistic concurrency support has been added for better performance. Permissions and access

control have been made more granular and the query processor handles concurrent execution of

queries in a more efficient way. Partitions on tables and indexes are supported natively, so

scaling out a database onto a cluster is easier.

SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET

Framework.

SQL Server 2005 introduced "MARS" (Multiple Active Results Sets), a method of

allowing usage of database connections for multiple purposes.

SQL Server 2005 introduced DMVs (Dynamic Management Views), which are

specialized views and functions that return server state information that can be used to

monitor the health of a server instance, diagnose problems, and tune performance.

Service Pack 1 (SP1) of SQL Server 2005 introduced Database Mirroring, a high

availability option that provides redundancy and failover capabilities at the database

level.

Know more @ www.vidyarthiplus.com

5.7.3 SQL Server 2008

SQL Server 2008 (codename Katmai) was released on August 6, 2008 and aims to make

data management self organizing, and self maintaining with the development of SQL Server

Always On technologies, to provide near-zero downtime. SQL Server 2008 also includes support

for and semi-structured data, including digital media formats for pictures, audio, video and other

multimedia data. In current versions, such multimedia data can be stored as (binary large

objects), but they are generic bitstreams. Intrinsic awareness of multimedia data will allow

specialized functions to be performed on them. According to, senior Vice President, Server

Applications, ., SQL Server 2008 can be a data storage backend for different varieties of data:

XML, email, time/calendar, file, document, spatial, etc as well as perform search, query,

analysis, sharing, and synchronization across all data types.

Other new data types include specialized date and time types and a Spatial data type for

location-dependent data. Better support for unstructured and semi-structured data is provided

using the new FILESTREAM data type, which can be used to reference any file stored on the file

system.

Structured data and metadata about the file is stored in SQL Server database, whereas the

unstructured component is stored in the file system. Such files can be accessed both via file

handling as well as via SQL Server using; doing the latter accesses the file data as a BLOB.

Backing up and restoring the database backs up or restores the referenced files as well. SQL

Server 2008 also natively supports hierarchical data, and includes constructs to directly deal with

them, without using recursive queries.

The Full-text search functionality has been integrated with the database engine.

According to a Microsoft technical article, this simplifies management and improves

performance.

Spatial data will be stored in two types,

Know more @ www.vidyarthiplus.com

A "Flat Earth" (GEOMETRY or planar) data type represents geospatial data which has

been projected from its native, spherical, coordinate system into a plane.

A "Round Earth" data type (GEOGRAPHY) uses an ellipsoidal model in which the Earth

is defined as a single continuous entity which does not suffer from the singularities such

as the international dateline, poles, or map projection zone "edges". Approximately 70

methods are available to represent spatial operations for the Open Geospatial Consortium

Simple Features for SQL, Version 1.1.

SQL Server includes better compression features, which also helps in improving

scalability. It enhanced the indexing algorithms and introduced the notion of filtered

indexes. It also includes Resource Governor that allows reserving resources for certain

users or workflows. It also includes capabilities for transparent encryption of data (TDE)

as well as compression of backups.

SQL Server 2008 supports the ADO.NET Entity Framework and the reporting tools,

replication, and data definition will be built around the Entity Data Model. SQL Server

Reporting Services will gain charting capabilities from the integration of the data

visualization products from Dundas Data Visualization, Inc., which was acquired by

Microsoft.

On the management side, SQL Server 2008 includes the Declarative Management

Framework which allows configuring policies and constraints, on the entire database or

certain tables, declaratively.

The version of SQL Server Management Studio included with SQL Server 2008 supports

IntelliSense for SQL queries against a SQL Server 2008 Database Engine.

SQL Server 2008 also makes the databases available via Windows PowerShell providers

and management functionality available as Cmdlets, so that the server and all the running

instances can be managed from Windows PowerShell.

5.7.4 SQL Server 2008 R2

SQL Server 2008 R2 (formerly codenamed SQL Server "Kilimanjaro") was announced at

TechEd 2009, and was released to manufacturing on April 21, 2010. SQL Server 2008 R2 adds

Know more @ www.vidyarthiplus.com

certain features to SQL Server 2008 including a master data management system branded as

Master Data Services, a central management of master data entities and hierarchies. Also Multi

Server Management, a centralized console to manage multiple SQL Server 2008 instances and

services including relational databases, Reporting Services, Analysis Services & Integration

Services.

SQL Server 2008 R2 includes a number of new services, including

PowerPivot for Excel and SharePoint,

Master Data Services,

StreamInsight, Report Builder 3.0,

Reporting Services Add-in for SharePoint,

a Data-tier function in Visual Studio

that enables packaging of tiered databases as part of an application, and a SQL Server

Utility named UC (Utility Control Point), part of AMSM (Application and Multi-Server

Management) that is used to manage multiple SQL Servers.

5.7.5 SQL Server 2012

At the 2011 Professional Association for SQL Server (PASS) summit on October 11,

Microsoft announced that the next major version of SQL Server, codenamed Denali, would be

SQL Server 2012. It was released to manufacturing on March 6, 2012.

It was announced to be last version to natively support OLE DB and instead to prefer

ODBC for native connectivity. This announcement has caused some controversy.

SQL Server 2012's new features and enhancements include always on SQL Server

Failover Cluster Instances and Availability Groups which provides a set of options to

improve database availability,

Contained Databases which simplify the moving of databases between instances,

Know more @ www.vidyarthiplus.com

new and modified Dynamic Management Views and Functions,

programmability enhancements including new Spatial features,

Metadata discovery,

Sequence objects and the THROW statement,

performance enhancements such as Column Store Indexes as well as

improvements to Online and

Partition level operations and security enhancements including Provisioning

During Setup, new permissions, improved role management and default schema

assignment for groups.

5.7.6 Database Server Architecture

The Mimer SQL DBMS is based on client/server architecture. The Database Server

executes in one single, multi-threaded process with multiple Request and Background threads.

On some platforms Communication threads are used. The Mimer SQL architecture is truly

multi-threaded, with requests being dynamically allocated to the different Request threads. As

threads scale very well over multiple CPUs, Mimer SQL is particularly well suited for symmetric

multiprocessor (SMP) environments. By the use of threads within the Database Server, optimal

efficiency is achieved when context-switching in the Database Server.

It also ensures that the application can only view data that has been formerly passed to the client

side, which is extremely important from a data security point of view.

Know more @ www.vidyarthiplus.com

Figure 5.7 Mimer SQL Database Server Architecture

The Communication threads are used to handle parts of the communication between the

applications and the database server. On some platforms other mechanisms are used to handle

the communication between the applications and the database server. Whatever the mechanism,

all communication with the database server is multi-threaded, allowing large numbers of

simultaneous user requests.

Both local and remote applications are handled directly by the Database Server. This means that

in Client/Server environments, where Mimer SQL executes in a distributed environment with

the client and server on different machines, all remote clients connect directly to the Database

Server. Thereby avoiding any additional overhead of network service processes being started,

either on the client or on the server machine.

The Request threads perform the SQL operations requested by the applications. When the

Database Server is requested to perform a SQL operation it allocates one of its Request threads

Know more @ www.vidyarthiplus.com

to perform the task. When the SQL operation is complete the result is returned back to the

application, and the Request thread that has performed the operation returns to a waiting state

until it receives another server request. Since the SQL operations are evaluated entirely within

the Database Server, inter-process communication is reduced to a minimum.

When a SQL query or a stored routine is executed by a Request thread, the compiled version of

the query or the routine is stored within the Database Server. In this way the same, compiled

version of the query or routine can be used again by other applications. This leads to improved

performance, since a SQL query or a stored routine only need to be compiled once by the

Database Server.

The Background threads perform database services including all database updates, online

backup and database shadowing. These services are performed asynchronously in the

background to the application processes, which means that the application process does not have

to wait for the physical completion of a transaction or a shadow update, but can continue as soon

as the transaction has been prepared and secured to disk.

I/O-operations are performed in parallel directly by the request and background threads using

asynchronous I/O. Thereby any need for separate I/O-threads are avoided.

5.8 Middleware and Federated Database Technology

In a large modern enterprise, it is almost inevitable that different portions of the organization will

use different database management systems to store and search their critical data. Competition,

evolving technology, mergers, acquisitions, geographic distribution, and the inevitable

decentralization of growth all contribute to this diversity. Yet it is only by combining the

information from these systems that the enterprise can realize the full value of the data they

contain.

For example, in the finance industry, mergers are an almost commonplace occurrence. The

newly created entity inherits the data stores of the original institutions. Many of those stores will

be relational database management systems, but often from different manufacturers; for instance,

Know more @ www.vidyarthiplus.com

one company may have used primarily Sybase, and another Informix IDS. They may both have

one or more document management systems -- such as Documentum or IBM Content Manager --

for storing text documents such as copies of loans, etc.

The Garlic project demonstrated the feasibility of extending this idea to build a federated

database system that effectively exploits the query capabilities of diverse, possibly non-relational

data sources. In both of these systems, as in today's DB2, a middleware query processor develops

optimized execution plans and compensates for any functionality that the data sources may lack.

In this article, we describe the key characteristics of IBM's federated technology: transparency,

heterogeneity, a high degree of function, autonomy for the underlying federated sources,

extensibility, openness, and optimized performance. We then "roll back the covers" to show how

IBM's database federation capabilities work. We illustrate how the federated capabilities can be

used in a variety of scenarios, and conclude with some directions for the future.

5.8.1 Characteristics Of The Federated Solution

TRANSPARENCY

If a federated system is transparent, it masks from the user the differences, idiosyncracies, and

implementations of the underlying data sources. Ideally, it makes the set of federated sources

look to the user like a single system. The user should not need to be aware of where the data is

stored (location transparency), what language or programming interface is supported by the data

source (invocation transparency), if SQL is used, what dialect of SQL the source supports

(dialect transparency), how the data is physically stored, or whether it is partitioned and/or

replicated (physical data independence, fragmentation and replication transparency), or what

networking protocols are used (network transparency). The user should see a single uniform

interface, complete with a single set of error codes (error code transparency). IBM provides all

these features, allowing applications to be written as if all the data were in a single database,

although, in fact, the data may be stored in a heterogeneous collection of data sources.

Know more @ www.vidyarthiplus.com

Heterogeneity

Heterogeneity is the degree of differentiation in the various data sources. Sources can differ in

many ways. They may run on different hardware, use different network protocols, and have

different software to manage their data stores. They may have different query languages,

different query capabilities, and even different data models. They may handle errors differently,

or provide different transaction semantics. They may be as much alike as two Oracle instances,

one running Oracle 8i, and the other Oracle 9i, with the same or different schemas. Or they may

be as diverse as a high-powered relational database, a simple, structured flat file, a web site that

takes queries in the form of URLs and spits back semi-structured XML according to some DTD,

a Web service, and an application that responds to a particular set of function calls. IBM's

federated database can accommodate all of these differences, encompassing systems such as

these in a seamless, transparent federation.

High Degree Of Function

IBM's federated capability provides users with the best of both worlds: all the function of its rich,

standard-compliant DB2 SQL capability against all the data in the federation, as well as all the

function of the underlying data sources. DB2's SQL includes support for many complex query

features, including inner and outer joins, nested sub queries and table expressions, recursion,

user-defined functions, aggregation, statistical analyses, automatic summary tables, and others

too numerous to mention. Many data sources may not provide all of these features. However,

users still get the full power of DB2 SQL on these sources' data, because of function

compensation. Function compensation means that if a data source cannot do a particular query

function, the federated database retrieves the necessary data and applies the function itself. For

example, a file system typically cannot do arbitrary sorts. However, users can still request that

data from that source (ie, some subset of a file) be retrieved in some order, or ask that duplicates

be eliminated from that data. The federated database will simply retrieve the relevant data, and

do the sort itself.

Know more @ www.vidyarthiplus.com

While many sources do not provide all the function of DB2 SQL, it is also true that many

sources have specialized functionality that the IBM federated database lacks. For example,

document management systems often have scoring functions that let them estimate the relevancy

of retrieved documents to a user's search. In the financial industry, time-series data is especially

important, and systems exist that can compare, plot, analyse, and subset time-series data in

specialized ways. In the pharmaceutical industry, new drugs are based on existing compounds

with particular properties. Special-purpose systems can compare chemical structures, or simulate

the binding of two molecules. While such functions could be implemented directly, it is often

more efficient and cost-effective to exploit the functionality that already exists in data sources

and application systems.

Extensibility And Openness Of The Federation

All systems need to evolve over time. In a federated system, new sources may be needed to meet

the changing needs of the users' business. IBM makes it easy to add new sources. The federated

database engine accesses sources via a software component know as a wrapper. Accessing a new

type of data source is done by acquiring or creating a wrapper for that source. The wrapper

architecture enables the creation of new wrappers. Once a wrapper exists, simple data definition

(DDL) statements allow sources to be dynamically added to the federation without stopping

ongoing queries or transactions.

Any data source can be wrapped. IBM supports the ANSI SQL/MED standard (MED stands for

Management of External Data). This standard documents the protocols used by a federated

server to communicate with external data sources. Any wrapper written to the SQL/MED

interface can be used with IBM's federated database. Thus wrappers can be written by third

parties as well as IBM, and used in conjunction with IBM's federated database.

Autonomy For Data Sources

Typically a data source has existing applications and users. It is important, therefore, that the

operation of the source is not affected when it is brought into a federation. IBM's federated

database does not disturb the local operation of an existing data source. Existing applications will

Know more @ www.vidyarthiplus.com

run unchanged, data is neither moved nor modified, and interfaces remain the same. The way the

data source processes requests for data is not affected by the execution of global queries against

the federated system, though those global queries may touch many different data sources.

Likewise, there is no impact on the consistency of the local system when a data source enters or

leaves a federation. The sole exception is during federated two phase commit processing for

sources that participate. (While not available in V7 of DB2, federated two phase commit was

used in DataJoiner.) Data sources involved in the same unit of work will need to participate in

commit processing and can be requested to roll back the associated changes if necessary.

Unlike other products, our wrapper architecture does not require any software to be installed on

the machine that hosts the data source.

Optimized Performance

The optimizer is the component of a relational database management system that determines the

best way to execute each query. Relational queries are non-procedural and there are typically

several different implementations of each relational operator and many possible orderings of

operators to choose from in executing a query. While some optimizers use heuristic rules to

choose an execution strategy, IBM's federated database considers the various possible strategies,

modeling the likely cost of each, and choosing the one with the least cost. (Typically, cost is

measured in terms of system resources consumed).

In a federated system, the optimizer must decide whether the different operations involved in a

query should be done by the federated server or by the source where the data is stored. It must

also determine the order of the operations, and what implementations to use to do local portions

of the query. To make these decisions, the optimizer must have some way of knowing what each

data source can do, and how much it costs. For example, if the data source is a file, it would not

make sense to assume it was smart, and ask it to perform a sort or to apply some function. On the

other hand, if the source is a relational database system capable of applying predicates and doing

joins, it might be a good idea to take advantage of its power if it will reduce the amount of data

that needs to be brought back to the federated engine. This will typically depend on the details of

Know more @ www.vidyarthiplus.com

the individual query. The IBM optimizer works with the wrappers for the different sources

involved in a query to evaluate the possibilities. Often the difference between a good and a bad

decision on the execution strategy is several orders of magnitude in performance. IBM's

federated database is unique in the industry in its ability to work with wrappers to model the

costs of federated queries over diverse sources. As a result, users can expect the best

performance possible from their federated system.

To further enhance performance, each wrapper implementation takes advantage of the

operational knobs provided by each data source using the source's native API.

5.8.3 Fedarated Architecture

federated database architecture is shown in. Applications can use any supported interface

(including ODBC, JDBC, or a Web service client) to interact with the federated server.

The federated server communicates with the data sources by means of software modules called

wrappers

Figure 5.8 Architecture of an federated system

Configuring a federated system

Know more @ www.vidyarthiplus.com

A federated system is created by installing the federated engine and then configuring it to talk to

the data sources. There are several steps to add a new data source to a federated system. First, a

wrapper for the source must be installed, and IBM's federated database must then be told where

to find this wrapper. This is done by means of a CREATE WRAPPER statement. If multiple

sources of the same type are desired, only one wrapper is needed. For example, even if the

federated system will include five Oracle database instances, possibly on different machines,

only one Oracle wrapper is needed, and hence, only one CREATE WRAPPER statement will be

required.

The AS TEMPLATE clause tells the federated database that there is no local implementation of

the function. Next, a CREATE FUNCTION MAPPING statement tells the federated database

what server can evaluate the function. Several function mappings may be created for the same

function. For our example, the following statement accomplishes the mapping:

The above DDL statements produce metadata describing the information about nicknames and

the signatures of mapped functions. This metadata is used by the federated query processing

engine and is stored in the global catalogues of the federated database.

5.9 QUERY PROCESSING

After the federated system is configured, an application can submit a query written in SQL to a

federated server. The federated server optimizes the query, developing an execution plan in

which the query has been decomposed into fragments that can be executed at individual data

sources. As mentioned above, many decompositions of the query are possible, and the optimizer

chooses among alternatives on the basis of minimum estimated total resource consumption. Once

a plan has been selected, the federated database drives the execution, invoking the wrappers to

execute the fragments assigned to them

Know more @ www.vidyarthiplus.com

Figure 5.9 Query Processing

The optimizer works differently with relational and non-relational wrappers. The optimizer

models relational sources in detail, using information provided by the wrapper to generate plans

that represent what it expects the source to do.

However, because non-relational sources do not have a common set of operations or common

data model, a more flexible arrangement is required with these sources.

Know more @ www.vidyarthiplus.com

Hence the optimizer works with the non-relational wrappers:

The IBM federated database submits candidate query fragments called "requests" to a

wrapper if the query fragments apply to a single source.

When a non-relational wrapper receives a request, it determines what portion, if any, of

the corresponding query fragment can be performed by the data source.

The wrapper returns a reply that describes the accepted portion of the fragment.

The reply also includes an estimate of the number of rows that will be produced, an

estimate of the total execution time, and a wrapper plan.

An encapsulated representation of everything the wrapper will need to know to execute

the accepted portion of the fragment.

The federated database optimizer incorporates the reply into a global plan, introducing

additional operators as necessary to compensate for portions of fragments that were not

accepted by a wrapper.

The cost and cardinality information from the replies is used to estimate the total cost of

the plan, and the plan with minimum total cost is selected from among all the candidates.

When a plan is selected, it need not be executed immediately; it can be stored in the

database catalogues and subsequently used one or more times to execute the query.

Even if a plan is used immediately, it need not be executed in the same process in which

it was created, as illustrated in Figure 3.

Know more @ www.vidyarthiplus.com

Figure 5.10 Compilation and runtime for non-relational sources

5.10 Data Warehouse

A data warehouse is a relational database that is designed for query and analysis rather than for

transaction processing. It usually contains historical data derived from transaction data, but it can

include data from other sources. It separates analysis workload from transaction workload and

enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction,

transportation, transformation, and loading (ETL) solution, an online analytical processing

(OLAP) engine, client analysis tools, and other applications that manage the process of gathering

data and delivering it to business users.

5.10.1 Characteristics of data warehouses

A common way of introducing data warehousing is to refer to the characteristics of a data

warehouse as set forth by William Inmon:

Subject Oriented

Integrated

Non volatile

Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about your

company's sales data, you can build a warehouse that concentrates on sales. Using this

warehouse, you can answer questions like "Who was our best customer for this item last year?"

This ability to define a data warehouse by subject matter, sales in this case, makes the data

warehouse subject oriented.

Know more @ www.vidyarthiplus.com

Integrated

Integration is closely related to subject orientation. Data warehouses must put data from

disparate sources into a consistent format. They must resolve such problems as naming conflicts

and inconsistencies among units of measure. When they achieve this, they are said to be

integrated.

Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical

because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant

In order to discover trends in business, analysts need large amounts of data. This is very much in

contrast to online transaction processing (OLTP) systems, where performance requirements

demand that historical data be moved to an archive. A data warehouse's focus on change over

time is what is meant by the term time variant

Know more @ www.vidyarthiplus.com

Figure 5.11 Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in

third normal form (3NF), a type of data normalization common in OLTP environments.

Data warehouses and OLTP systems have very different requirements. Here are some examples

of differences between typical data warehouses and OLTP systems:

5.10.2 Distributed Data Warehouse

Implementing a distributed data warehouse has been shown to provide higher availability and

lower overall cost.

An enterprise can create several data marts that store only high level summaries of data derived

from the warehouse. With IBM's federated technology, data marts and warehouse can be on

separate systems, yet users of the data mart can still drill down with ease from their local level of

summarization into the warehouse. Federated technology shields the users, who have no need to

know that the data warehouse is distributed, by providing a virtual data warehouse.

5.10.3 Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the specifics of an organization's

situation. Three common architectures are:

Data Warehouse Architecture (Basic)

Data Warehouse Architecture (with a Staging Area)

Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture (Basic)

shows a simple architecture for a data warehouse. End users directly access data derived from

several source systems through the data warehouse.

Know more @ www.vidyarthiplus.com

Figure 5.12 Architecture of a Data Warehouse

In Figure 1-2, the metadata and raw data of a traditional OLTP system is present, as is an

additional type of data, summary data. Summaries are very valuable in data warehouses because

they pre-compute long operations in advance. For example, a typical data warehouse query is to

retrieve something like August sales. A summary in Oracle is called a materialized view.

Data Warehouse Architecture (with a Staging Area)

In Figure 1-2, you need to clean and process your operational data before putting it into the

warehouse. You can do this programmatically, although most data warehouses use a staging

area instead. A staging area simplifies building summaries and general warehouse management.

Figure 1-3 illustrates this typical architecture.

Know more @ www.vidyarthiplus.com

Fig 5.13 Architecture of a Data Warehouse with a Staging Area

Data Warehouse Architecture (with a Staging Area and Data Marts)

Although the architecture in Figure 1-3 is quite common, you may want to customize your

warehouse's architecture for different groups within your organization. You can do this by

adding data marts, which are systems designed for a particular line of business

Data Warehouse (PDW Parallel)

A massively parallel processing (MPP) SQL Server appliance optimized for large-scale data

warehousing such as hundreds of terabytes.

5.10.3 Architecture

This architecture of MS SQL Server contains different layers and services.

Protocol layer

Protocol layer implements the external interface to SQL Server. All operations that can be

invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular

Data Stream (TDS). TDS is an application layer protocol, used to transfer data between a

database server and a client. Initially designed and developed by Sybase Inc. for their Sybase

SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server,

TDS packets can be encased in other physical transport dependent protocols, including TCP/IP,

Named pipes, and Shared memory. Consequently, access to SQL Server is available over these

protocols. In addition, the SQL Server API is also exposed over web services.

Data storage

The main unit of data storage is a database, which is a collection of tables with typed columns.

SQL Server supports different data types, including primary types such as Integer, Float,

Know more @ www.vidyarthiplus.com

Decimal, Char (including character strings), Varchar (variable length character strings), binary

(for unstructured blobs of data), Text (for textual data) among others. The rounding of floats to

integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (Fix)

depending on arguments: SELECT Round(2.5, 0) gives 3.

Microsoft SQL Server also allows user-defined composite types (UDTs) to be defined and used.

It also makes server statistics available as virtual tables and views (called Dynamic Management

Views or DMVs). In addition to tables, a database can also contain other objects including views,

stored procedures, indexes and constraints, along with a transaction log.

Buffer management

SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-

memory, and the set of all pages currently buffered is called the buffer cache. The amount of

memory available to SQL Server decides how many pages will be cached in memory.

The buffer cache is managed by the Buffer Manager. Either reading from or writing to any page

copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy,

rather than the on-disc version. The page is updated on the disc by the Buffer Manager only if

the in-memory cache has not been referenced for some time. While writing pages back to disc,

asynchronous I/O is used whereby the I/O operation is done in a background thread so that other

operations do not have to wait for the I/O operation to complete. Each page is written along with

its checksum when it is written. When reading the page back, its checksum is computed again

and matched with the stored version to ensure the page has not been damaged or tampered with

in the meantime.

Logging and Transaction

SQL Server ensures that any change to the data is ACID-compliant, i.e. it uses transactions to

ensure that the database will always revert to a known consistent state on failure. Each

transaction may consist of multiple SQL statements all of which will only make a permanent

Know more @ www.vidyarthiplus.com

change to the database if the last statement in the transaction (a COMMIT statement) completes

successfully. If the COMMIT successfully completes the transaction is safely on disk.

Any changes made to any page will update the in-memory cache of the page, simultaneously all

the operations performed will be written to a log, along with the transaction ID which the

operation was a part of. Each log entry is identified by an increasing Log Sequence Number

(LSN) which is used to ensure that all changes are written to the data files. Also during a log

restore it is used to check that no logs are duplicated or skipped. SQL Server requires that the log

is written onto the disc before the data page is written back. It must also ensure that all

operations in a transaction are written to the log before any COMMIT operation is reported as

completed.

Concurrency and locking

SQL Server allows multiple clients to use the same database concurrently. As such, it needs to

control concurrent access to shared data, to ensure data integrity - when multiple clients update

the same data, or clients attempt to read data that is in the process of being changed by another

client. SQL Server provides two modes of concurrency control: pessimistic concurrency and

optimistic concurrency.

When pessimistic concurrency control is being used, SQL Server controls concurrent access by

using locks. Locks can be either shared or exclusive. Exclusive lock grants the user exclusive

access to the data - no other user can access the data as long as the lock is held. Shared locks are

used when some data is being read - multiple users can read from data locked with a shared lock,

but not acquire an exclusive lock. The latter would have to wait for all shared locks to be

released. Locks can be applied on different levels of granularity - on entire tables, pages, or even

on a per-row basis on tables. For indexes, it can either be on the entire index or on index leaves.

The level of granularity to be used is defined on a per-database basis by the database

administrator. While a fine grained locking system allows more users to use the table or index

simultaneously, it requires more resources. So it does not automatically turn into higher

performing solution. SQL Server also includes two more lightweight mutual exclusion solutions

Know more @ www.vidyarthiplus.com

- latches and spinlocks - which are less robust than locks but are less resource intensive. SQL

Server uses them for DMVs and other resources that are usually not busy. SQL Server also

monitors all worker threads that acquire locks to ensure that they do not end up in deadlocks - in

case they do, SQL Server takes remedial measures, which in many cases is to kill one of the

threads entangled in a deadlock and rollback the transaction it started. To implement locking,

SQL Server contains the Lock Manager.

The Lock Manager maintains an in-memory table that manages the database objects and locks, if

any, on them along with other metadata about the lock. Access to any shared object is mediated

by the lock manager, which either grants access to the resource or blocks it.

Data retrieval

The main mode of retrieving data from an SQL Server database is querying for it. The query is

expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with

Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved.

It is processed by the query processor, which figures out the sequence of steps that will be

necessary to retrieve the requested data. The sequence of actions necessary to execute a query is

called a query plan. There might be multiple ways to process the same query.

5.11 EIS/DSS

5.11.1 Environmental Impact Statement

An environmental impact statement (EIS), under United States environmental law, is a

document required by the National Environmental Policy Act (NEPA) for certain actions

"significantly affecting the quality of the human environment".An EIS is a tool for

decision making. It describes the positive and negative environmental effects of a

proposed action, and it usually also lists one or more alternative actions that may be

chosen instead of the action described in the EIS. Several US state governments require

that a document similar to an EIS be submitted to the state for certain actions. For

Know more @ www.vidyarthiplus.com

example, in California, an Environmental Impact Report (EIR) must be submitted to the

state for certain actions, as described in the California Environmental Quality Act

(CEQA).

Purpose

The purpose of the NEPA is to promote informed decision-making by federal agencies by

making "detailed information concerning significant environmental impacts" available to

both agency leaders and the public. The NEPA was the first piece of legislation that

created a comprehensive method to assess potential and existing environmental risks at

once. One of the primary authors of the act was Lynton K. Caldwell. It also encourages

communication and cooperation between all the actors involved in environmental

decisions, including government officials, private businesses, and citizens.

Contrary to a widespread misconception, NEPA does not prohibit the federal government

or its licensees/permittees from harming the environment, but merely requires that the

prospective impacts be understood and disclosed in advance. The intent of NEPA is to

help key decision makers and stakeholders balance the need to implement an action with

its impacts on the surrounding human and natural environment

5.11.2 Decision Support System

A decision support system (DSS) is a computer-based that supports business or organizational

activities. DSSs serve the management, operations, and planning levels of an organization and

help to make decisions, which may be rapidly changing and not easily specified in advance.

DSSs include. A properly designed DSS is an interactive software-based system intended to help

decision makers compile useful information from a combination of raw data, documents, and

personal knowledge, or business models to identify and solve problems and make decisions.

Typical information that a decision support application might gather and present includes:

Know more @ www.vidyarthiplus.com

comparative sales figures between one period and the next,

projected revenue figures based on product sales assumptions.

5.12 Data Mining

Generally, data mining (sometimes called data or knowledge discovery) is the process of

analyzing data from different perspectives and summarizing it into useful information -

information that can be used to increase revenue, cuts costs, or both. Data mining software is one

of a number of analytical tools for analyzing data. It allows users to analyze data from many

different dimensions or angles, categorize it, and summarize the relationships identified.

Technically, data mining is the process of finding correlations or patterns among dozens of fields

in large relational databases.

5.12.1 Data, Information, and Knowledge

Data

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations

are accumulating vast and growing amounts of data in different formats and different databases.

This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting

nonoperational data, such as industry sales, forecast data, and macro economic data

meta data - data about the data itself, such as logical database design or data dictionary

definitions

Information

The patterns, associations, or relationships among all this data can provide information. For

example, analysis of retail point of sale transaction data can yield information on which products

are selling and when.

Know more @ www.vidyarthiplus.com

Knowledge

Information can be converted into knowledge about historical patterns and future trends. For

example, summary information on retail supermarket sales can be analyzed in light of

promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or

retailer could determine which items are most susceptible to promotional efforts.

Data mining is primarily used today by companies with a strong consumer focus - retail,

financial, communication, and marketing organizations. It enables these companies to determine

relationships among "internal" factors such as price, product positioning, or staff skills, and

"external" factors such as economic indicators, competition, and customer demographics. And, it

enables them to determine the impact on sales, customer satisfaction, and corporate profits.

Finally, it enables them to "drill down" into summary information to view detail transactional

data.

With data mining, a retailer could use point-of-sale records of customer purchases to send

targeted promotions based on an individual's purchase history. By mining demographic data

from comment or warranty cards, the retailer could develop products and promotions to appeal to

specific customer segments.

5.12.2 Work of data mining

While large-scale information technology has been evolving separate transaction and analytical

systems, data mining provides the link between the two. Data mining software analyzes

relationships and patterns in stored transaction data based on open-ended user queries. Several

types of analytical software are available: statistical, machine learning, and neural networks.

Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a

restaurant chain could mine customer purchase data to determine when customers visit

and what they typically order. This information could be used to increase traffic by

having daily specials.

Know more @ www.vidyarthiplus.com

Clusters: Data items are grouped according to logical relationships or consumer

preferences. For example, data can be mined to identify market segments or consumer

affinities.

Associations: Data can be mined to identify associations. The beer-diaper example is an

example of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and trends. For

example, an outdoor equipment retailer could predict the likelihood of a backpack being

purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through training and

resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as genetic

combination, mutation, and natural selection in a design based on the concepts of natural

evolution.

Know more @ www.vidyarthiplus.com

Decision trees: Tree-shaped structures that represent sets of decisions. These decisions

generate rules for the classification of a dataset. Specific decision tree methods include

Classification and Regression Trees (CART) and Chi Square Automatic Interaction

Detection (CHAID) . CART and CHAID are decision tree techniques used for

classification of a dataset. They provide a set of rules that you can apply to a new

(unclassified) dataset to predict which records will have a given outcome. CART

segments a dataset by creating 2-way splits while CHAID segments using chi square tests

to create multi-way splits. CART typically requires less data preparation than CHAID.

Nearest neighbour method: A technique that classifies each record in a dataset based on

a combination of the classes of the k record(s) most similar to it in a historical dataset

(where k 1). Sometimes called the k-nearest neighbour technique.

Rule induction: The extraction of useful if-then rules from data based on statistical

significance.

Data visualization: The visual interpretation of complex relationships in

multidimensional data. Graphics tools are used to illustrate data relationships.

5.13 GROUPWARE SERVER

Groupware addresses the management of semi-structured information such as text,

image, mail, bulletin boards and the flow of work. These Client/Server systems have people in

direct contact with other people.

Groupware is a category of software designed to help groups work together by

facilitating the exchange of information among group members who may or may not be located

in the same office. Often, groupware users are collaborating on the same project, although

groupware can be used to share a variety of information throughout an entire organization and

can also be extended to clients, suppliers, and other users outside the organization.

Know more @ www.vidyarthiplus.com

Groupware is an ideal mechanism for sharing less-structured information (for example,

text or diagrams, as opposed to fielded or structured data) that might not otherwise be accessible

to others. It is also used to define workflow, so that as one user completes a step in a project or

process, the person responsible for the next step is notified automatically.

5.13.1 Features Of Groupware Server

Groupware packages offered by different software vendors will include different features

and functions, but most typically include the following components:

Calendaring and Scheduling. Each user maintains an online calendar to track

appointments, days out of the office, and other times when he or she is unavailable. Other

users can view their colleagues' calendars to look for "free" time for scheduling a new

meeting.

Discussion Databases. These are topic-specific databases where a user can post an idea,

question, or suggestion on a given subject, and other users can post their responses. A

discussion board may be set up for a short period of time to gather comments, for

example, on an upcoming event, or left up indefinitely, say to solicit new product ideas

on an ongoing basis. Usually, the name of each person who posted an item is recorded,

but anonymous postings are an option.

Reference Libraries. These are collections of reference materials, such as employee

handbooks, policy and procedure manuals, and similar documents. Typically, only certain

users are able to post materials to a reference database, while other users have "read only"

access—that is, they can view the materials but are not authorized to make any changes

to them.

Email. This is probably the most heavily used groupware feature and is used to send

messages to other groupware users.

A message may be addressed to one or more individuals or sent to a group, such as

"Sales," that includes the names of all people within a given department. Generally, users

are also able to send messages to individuals located outside the organization.

Know more @ www.vidyarthiplus.com

Figure 5.14 Features Of Groupware Server

Although email is an essential component of groupware, email and groupware employ

different methods for disseminating information. Every email message that is sent must have one

or more recipients listed in the "To:" field. This is called the "push" model because it pushes the

message out to the recipients whether or not a given recipient is interested in receiving it.

Groupware uses the "pull" model, in that each user accesses and pulls from the various group-

ware applications that information which is of relevance to him or her.

Groupware functionality may also include the ability to control who sees a given piece of

information. Access can be limited to specifically named individuals or to members of a group,

such as all managers, members of the accounting department, or those employees working on a

sensitive or confidential project. For example, job descriptions may be accessible to all users, but

Know more @ www.vidyarthiplus.com

access to related salary information may be limited to managers and members of the human

resources department.

5.13.2 Work Of Groupware

Groupware software can be divided into two categories: server and client. Depending on

the size of an organization and the number of users, the software is installed on one or more

computers (called "servers") in a network. Server software can also be installed on computers

located in other locations, from a nearby city to another state or country. The server houses the

actual applications and the associated information entered into them by the various users. If more

than one server is used, the servers will "speak" to one another in a process called "replication."

In this way, information held in the same database, but in a different locations or on a different

server, is exchanged between the servers. Once this is accomplished, the servers are said to be

"synchronized."

Each person using the groupware has the client software installed on his or her desktop or

laptop computer. The client software enables the user to access and interact with the applications

stored on the servers. Some users may be "remote;" that is, they are not in the office on a full-

time basis but rather use a modem or other type of connection to access and use the groupware.

Know more @ www.vidyarthiplus.com

KET NOTES

Client server and internet

Web client server

3 tier client server web style

CGI

The server side of web

CGI and State

SQL database servers

Middleware and federated databases

Data warehouses

EIS/DSS to data mining

GroupWare Server

What is GroupWare?

Components of GroupWare

Know more @ www.vidyarthiplus.com

5.14 QUESTION BANK

PART-A

1. Define Internet. (April-20010)

2. What is Middleware?

3. What is CGI?

4. What is EIS? (April-2009)

5. What is DSS?

6. Define Data mining.

7. Define Middleware.

8. Define Data warehouses.

9. What is GroupWare?

10. List out Components of GroupWare.

11. What is the Era in web client server?

12. What are the web application protocols?

13. Draw the URL structure. (April-2009)

14. What are the types of HTTP header fields?

15. Define CGI.

16. Define hyperlink with its syntax.

17. Define CGI and its state.

18. What are the two security protocols in web?

19. Define SSL.

20. What are the ISO standards and define it?

Know more @ www.vidyarthiplus.com

21. What are the three types of SQL Server Architecture? (April-2010)

22. Differentiate between static and dynamic SQL.

23. Define SQL middleware.

24. What are the middleware solutions?

25. Draw the MDI gateway structure?

PART – B

1. Briefly explain about Client server and internet? (April-2009)

2. Discuss about Web client server.

3. Briefly explain about 3 tier client server web style?

4. Briefly explain about CGI and State?

5. Discuss SQL database servers. (April-2010)

6. Discuss merits and demerits Middleware and federated databases.

7. Briefly explain about Data warehouses?

8. Explain EIS/DSS to data mining? (April-2010)

9. Briefly explain about GroupWare Server?

10. Explain Components of GroupWare?

11. Describe about 3-tier client/server web style. (April-2009)

12. Explain CGI scenario based on the web client/server in the interactive era.

13. Brief description about SQL Database Server Architectures with ISO Standards.

14. Give brief explanation for CGI and STATE.

15. How to structure the flow of text in HTML document.