Chapter 01

41
Chapter 01 Chapter 01 Introduction to Distributed Database

description

Chapter 01. Introduction to Distributed Database. Overview. File System Menyediakan suatu prosedur bagi suatu program untuk menyimpan, melakukan update, dan mengambil data pada suatu media penyimpanan atau storage. Overview. Database Management System - PowerPoint PPT Presentation

Transcript of Chapter 01

Chapter 01Chapter 01Introduction to Distributed Database

OverviewOverviewFile System

◦Menyediakan suatu prosedur bagi suatu program untuk menyimpan, melakukan update, dan mengambil data pada suatu media penyimpanan atau storage

OverviewOverviewDatabase Management System

◦Suatu paket software yang melakukan kontrol, dan pengelolaan data di dalam database (kumpulan data)

DB Clients, Servers, and DB Clients, Servers, and EnvironmentsEnvironmentsDB-Server, a collection of programs

that execute all DBMS functionDB-Client, any application program

that needs to connect to a DB-Server

DB Environment (DBE), one or more DBs along with any software providing at least minimum set of required data operation and management.

DBE Architectural conceptDBE Architectural conceptService, logical collections of

related functionality. Example: Query Service

Sites, represents a logical location in an architectural diagram or deployment diagram

Component and Subsystem (COS)◦ Component, Deployable bundle of software that provide

reasonability cohesive set of functionality ◦ Subsystem, collection of one or more components that

work together toward a common goal

DBE ArchitecturesDBE Architectures

Required Services◦ Data Read Service (Drd-S)◦ Security Service (Sec-S)◦ Semantic Integrity Service

(Semi-S)

DBE ArchitecturesDBE Architectures

Basic Services◦ Data Read Service (Drd-S)◦ Security Service (Sec-S)◦ Semantic Integrity Service

(Semi-S)◦ Data Write Service (Dwr-S)

DBE ArchitecturesDBE Architectures

Expected Service◦ Data Read Service (Drd-S)◦ Security Service (Sec-S)◦ Semantic Integrity Service

(Semi-S)◦ Data Write Service (Dwr-S)◦ Query Request Service (Qreq-

S)◦ Query Optimization Service◦ Execution Service◦ Execution Optimization

Service

DBE ArchitecturesDBE Architectures

Expected Subsystem◦ Data Read Service (Drd-S)◦ Security Service (Sec-S)◦ Semantic Integrity Service

(Semi-S)◦ Data Write Service (Dwr-S)◦ Query Request Service (Qreq-S)◦ Query Optimization Service◦ Execution Service◦ Execution Optimization Service◦ User Interface

DBE ArchitecturesDBE Architectures

Typical DBMS Service◦ Drd-S, Sec-S, Semi-S, Dwr-S, Qreq-

S◦ Query Optimization Service◦ Execution Service◦ Execution Optimization Service◦ User Interface◦ Transaction Management (Trans-S)◦ Locking Service (Lock-S)◦ Timestamping Service (Times-S)◦ Deadlock Handling Service◦ Fallback and Recovery Service

MotivationMotivation

DatabaseTechnology

ComputerNetworks

integration distribution

integration

integration ≠ centralization

DistributedDatabaseSystems

DBMS Schema DBMS Schema ArchitectureArchitecture

DDBMS Schema DDBMS Schema ArchitectureArchitecture

Top Down DDBMS Software Top Down DDBMS Software ArchitectureArchitecture

Bottom Up DDBMS Software Bottom Up DDBMS Software ArchitectureArchitecture

Generic DDBMS Generic DDBMS architecturearchitecture

Distributed ComputingDistributed Computing

A concept in search of a definition and a name.

A number of autonomous processing elements (not necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks.

Distributed ComputingDistributed Computing

Synonymous terms◦ distributed function◦ distributed data processing◦ multiprocessors/multicomputers◦ satellite processing◦ backend processing◦ dedicated/special purpose

computers◦ timeshared systems◦ functionally modular systems

What is distributed …What is distributed …

Processing logic

Functions

Data

Control

What is a Distributed Database What is a Distributed Database System?System?

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network.

A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + D–DBMS

What is not a DDBS?What is not a DDBS?

A timesharing computer systemA loosely or tightly coupled

multiprocessor systemA database system which resides at one

of the nodes of a network of computers - this is a centralized database on a network node

Centralized DBMS on a Centralized DBMS on a NetworkNetwork

Site 5

Site 1

Site 2

Site 3Site 4

CommunicationNetwork

Distributed DBMS Distributed DBMS EnvironmentEnvironment

Site 5

Site 1

Site 2

Site 3Site 4

CommunicationNetwork

Implicit AssumptionsImplicit AssumptionsData stored at a number of sites

each site logically consists of a single processor.

Processors at different sites are interconnected by a computer network no multiprocessors◦ parallel database systems

Distributed database is a database, not a collection of files data logically related as exhibited in the users’ access patterns◦ relational data model

D-DBMS is a full-fledged DBMS◦ not remote file system, not a TP system

Shared-Memory Shared-Memory ArchitectureArchitecture

Examples : symmetric multiprocessors (Sequent, Encore) and some mainframes (IBM3090, Bull's DPS8)

P1 PnM

D

Shared-Disk ArchitectureShared-Disk Architecture

Examples :DEC's VAXcluster, IBM's IMS/VS Data Sharing

DP1

M1

Pn

Mn

Shared-Nothing Shared-Nothing ArchitectureArchitecture

Examples : Teradata's DBC, Tandem, Intel's Paragon, NCR's 3600 and 3700

P1

M1

D1

Pn

Mn

Dn

ApplicationsApplications

Manufacturing - especially multi-plant manufacturing

Military command and controlEFTCorporate MISAirlinesHotel chainsAny organization which has a

decentralized organization structure

Distributed DBMS PromisesDistributed DBMS Promises

Transparent management of distributed, fragmented, and replicated data

Improved reliability/availability through distributed transactions

Improved performance

Easier and more economical system expansion

TransparencyTransparency Transparency is the separation of the higher level

semantics of a system from the lower level implementation issues.

Fundamental issue is to providedata independence

in the distributed environment

◦ Network (distribution) transparency

◦ Replication transparency

◦ Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid

ExampleExample

TITLE SAL

PAY

Elect. Eng. 40000Syst. Anal. 34000Mech. Eng. 27000Programmer 24000

PROJ

PNO PNAME BUDGET

ENO ENAME TITLE

E1 J. Doe Elect. Eng.E2 M. Smith Syst. Anal.E3 A. Lee Mech. Eng.E4 J. Miller ProgrammerE5 B. Casey Syst. Anal.E6 L. Chu Elect. Eng.E7 R. Davis Mech. Eng.E8 J. Jones Syst. Anal.

EMP

ENO PNO RESP

E1 P1 Manager 12

DUR

E2 P1 Analyst 24E2 P2 Analyst 6E3 P3 Consultant 10E3 P4 Engineer 48E4 P2 Programmer 18E5 P2 Manager 24E6 P4 Manager 48E7 P3 Engineer 36

E8 P3 Manager 40

ASG

P1 Instrumentation 150000

P3 CAD/CAM 250000P2Database Develop.135000

P4 Maintenance 310000

E7 P5 Engineer 23

Transparent AccessTransparent Access

SELECT ENAME,SAL

FROM EMP,ASG,PAY

WHERE DUR > 12

AND EMP.ENO = ASG.ENO

AND PAY.TITLE = EMP.TITLE Paris projectsParis employeesParis assignmentsBoston employees

Montreal projectsParis projectsNew York projects with budget > 200000Montreal employeesMontreal assignments

Boston

CommunicationNetwork

Montreal

Paris

NewYork

Boston projectsBoston employeesBoston assignments

Boston projectsNew York employeesNew York projectsNew York assignments

Tokyo

Distributed Database - User ViewDistributed Database - User View

Distributed Database

Distributed DBMS - RealityDistributed DBMS - Reality

CommunicationSubsystem

UserQuery

DBMSSoftware

DBMSSoftware

UserApplication

DBMSSoftware

UserApplicationUser

QueryDBMS

Software

UserQuery

DBMSSoftware

Potentially Improved Potentially Improved PerformancePerformance

Proximity of data to its points of use◦ Requires some support for fragmentation and replication

Parallelism in execution◦ Inter-query parallelism

◦ Intra-query parallelism

Parallelism RequirementsParallelism Requirements

Have as much of the data required by each application at the site where the application executes◦ Full replication

How about updates?◦ Updates to replicated data requires implementation of

distributed concurrency control and commit protocols

System ExpansionSystem Expansion

Issue is database scaling

Emergence of microprocessor and workstation technologies◦ Demise of Grosh's law

◦ Client-server model of computing

Data communication cost vs telecommunication cost

Distributed DBMS IssuesDistributed DBMS IssuesDistributed Database Design

◦ how to distribute the database

◦ replicated & non-replicated database distribution

◦ a related problem in directory management

Query Processing◦ convert user transactions to data manipulation

instructions

◦ optimization problem

◦ min{cost = data transmission + local processing}

◦ general formulation is NP-hard

Distributed DBMS IssuesDistributed DBMS Issues

Concurrency Control◦ synchronization of concurrent accesses

◦ consistency and isolation of transactions' effects

◦ deadlock management

Reliability◦ how to make the system resilient to failures

◦ atomicity and durability

DirectoryManagement

Relationship Between Relationship Between IssuesIssues

Reliability

DeadlockManagement

QueryProcessing

ConcurrencyControl

DistributionDesign

Related IssuesRelated Issues

Operating System Support◦ operating system with proper support for database

operations◦ dichotomy between general purpose processing

requirements and database processing requirements

Open Systems and Interoperability◦ Distributed Multidatabase Systems◦ More probable scenario◦ Parallel issues