Querying XML Database Using Relational Database System

29
10/22/22 10/22/22 1 Querying XML Database Querying XML Database Using Using Relational Database System Relational Database System Rucha Patel Rucha Patel MS CS (Spring 2008) MS CS (Spring 2008) Advanced Database Systems CSc 8712 Advanced Database Systems CSc 8712 Instructor : Dr. Yingshu Li Instructor : Dr. Yingshu Li

Transcript of Querying XML Database Using Relational Database System

10/22/2210/22/22 11

Querying XML DatabaseQuerying XML DatabaseUsingUsing

Relational Database SystemRelational Database System

Rucha PatelRucha PatelMS CS (Spring 2008)MS CS (Spring 2008)

Advanced Database Systems CSc 8712Advanced Database Systems CSc 8712

Instructor : Dr. Yingshu LiInstructor : Dr. Yingshu Li

10/22/2210/22/22 22

Outline of PresentationOutline of Presentation1.1. Background Information regarding XMLBackground Information regarding XML2.2. Storing XML documents in relational DB Storing XML documents in relational DB

systemsystem3.3. Querying & Manipulating XML dataQuerying & Manipulating XML data

1.1. XML Data Models for Query ProcessingXML Data Models for Query Processing2.2. XML Labeling SchemesXML Labeling Schemes3.3. Structural JoinsStructural Joins

4.4. General Technique for Querying XML General Technique for Querying XML Documents using Relational DB SystemDocuments using Relational DB System

5.5. XQL ( XML Query Language )XQL ( XML Query Language )6.6. ConclusionConclusion

10/22/2210/22/22 33

Background Information - Background Information - XMLXML

Evolved from a document markup languageEvolved from a document markup language For exchange of structured and semi-For exchange of structured and semi-structured datastructured data

For self-describing data -> between For self-describing data -> between heterogeneous data sourcesheterogeneous data sources

XML Data Management SystemsXML Data Management Systems Specialized system – only for XML documentsSpecialized system – only for XML documents General System – manage XML along with General System – manage XML along with other data formats.other data formats.

10/22/2210/22/22 44

Background Information – Background Information – XML ( Contd… )XML ( Contd… )

XML is a recommendation of W3CXML is a recommendation of W3C XML Schema – Type System for XMLXML Schema – Type System for XML XPath – A language for navigating XPath – A language for navigating within XML documentswithin XML documents

XSLT – an XML transformation languageXSLT – an XML transformation language XQuery – a general purpose XML query XQuery – a general purpose XML query languagelanguage Based on XML Schema typesBased on XML Schema types Includes XPath as a subset.Includes XPath as a subset.

10/22/2210/22/22 55

Storing XML Documents in Storing XML Documents in RDB System RDB System

1 ) Simplest one is to use 1 ) Simplest one is to use Long Long Character StringCharacter String data type data type like, like, CLOB in SQLCLOB in SQL

Will store entire document as a Will store entire document as a character stringcharacter string

Textual FidelityTextual Fidelity Fails to take advantage of structural Fails to take advantage of structural

information available in XML markupinformation available in XML markup

10/22/2210/22/22 66

Storing XML Documents in Storing XML Documents in RDB System ( Contd… ) RDB System ( Contd… )

2 ) 2 ) ShreddingShredding Distributes XML information across one/more columns of Distributes XML information across one/more columns of

tables preserving both data values & structural tables preserving both data values & structural relationships.relationships.

For XML schema => tablesFor XML schema => tables levels of elements….levels of elements….

at each level – different tables for elements in hierarchyat each level – different tables for elements in hierarchySchema Based ShreddingSchema Based Shredding

Not efficient withNot efficient with sparse element - with varying contentssparse element - with varying contents Mixed contents – text + child elementsMixed contents – text + child elements

Fails to preserveFails to preserve Document orderingDocument ordering Processing instructions of XML documentsProcessing instructions of XML documents

10/22/2210/22/22 77

Storing XML Documents in Storing XML Documents in RDB System ( Contd… ) RDB System ( Contd… )

3)3) XML PublishingXML Publishing to reconstructs XML documents from to reconstructs XML documents from

relational tables,relational tables,Systems usually provides inverse information called Systems usually provides inverse information called

“XML Publishing”“XML Publishing” Such Systems with shredding + XML Such Systems with shredding + XML

publishing are said to provide publishing are said to provide Relational Relational FidelityFidelity

As authoritative form of data is As authoritative form of data is relational, not XML.relational, not XML.

4) Native XML with XML Fidelity.4) Native XML with XML Fidelity.

10/22/2210/22/22 88

Querying & Manipulating Querying & Manipulating XML DataXML Data

XML Storage facility -> interface to XML Storage facility -> interface to access and manipulate stored data.access and manipulate stored data.

XPath – better navigation within documents XPath – better navigation within documents but, but, can not transform structurescan not transform structures Can not construct new elementsCan not construct new elements

XSLT – transformation + Construction But,XSLT – transformation + Construction But, Recursive template-driven nature – unsuitable Recursive template-driven nature – unsuitable for optimizationfor optimization

XQuery – complete set of query facilities.XQuery – complete set of query facilities.

10/22/2210/22/22 99

Querying & Manipulating Querying & Manipulating XML Data ( Contd… )XML Data ( Contd… )

XML Data ModelXML Data Model XML documents as ordered, labeled, finite, XML documents as ordered, labeled, finite, unranked trees.unranked trees.

Relative order of nodes – order of Relative order of nodes – order of siblingssiblings

Region encoding labeling schemeRegion encoding labeling scheme < doc, start, end, level >< doc, start, end, level >

• Doc – to which document, node belongs toDoc – to which document, node belongs to• Start & end – position of element in a documentStart & end – position of element in a document• Level – level of a node in a treeLevel – level of a node in a tree

X is an ancestor of y, if and only ifX is an ancestor of y, if and only if• x.start < y.start and x.end > y.endx.start < y.start and x.end > y.end

10/22/2210/22/22 1010

Querying & Manipulating Querying & Manipulating XML Data ( Contd… )XML Data ( Contd… )

XML Labeling SchemesXML Labeling Schemes To evaluate queries in XPath, XSLT & To evaluate queries in XPath, XSLT &

XQuery,XQuery,1.1. Maintain results throughout the evaluation in Maintain results throughout the evaluation in

document orderdocument order• Restricts choice of query plansRestricts choice of query plans• Impossible if query requires data to be resorted Impossible if query requires data to be resorted

along different axis at some point.along different axis at some point.2.2. Sort OperatorSort Operator – handled at appropriate times – handled at appropriate times

• Assign each node a label – denoting relative orderAssign each node a label – denoting relative order• Like, region encoding schemeLike, region encoding scheme

• Ancestor-descent problemAncestor-descent problem• Variable size labeling scheme Variable size labeling scheme

• Do not need to relabel a node on update.Do not need to relabel a node on update.• Difficult to allocate fixed portion of each Difficult to allocate fixed portion of each

record for label. record for label.

10/22/2210/22/22 1111

Querying & Manipulating Querying & Manipulating XML Data ( Contd… )XML Data ( Contd… )

Storing XML in RDBMSStoring XML in RDBMS Labeling Scheme + edge shredding = Labeling Scheme + edge shredding =

form a single relation for storing XML form a single relation for storing XML DocDoc

Edge relationEdge relation1.1. Global Encoding SchemeGlobal Encoding Scheme

• Edge(Edge(idid, parent-id, end, path-id, value), parent-id, end, path-id, value)2.2. Local Encoding SchemeLocal Encoding Scheme

• Edge(Edge(idid, parent-id, sIndex, path-id, value), parent-id, sIndex, path-id, value)• sIndex – position of a node among siblingssIndex – position of a node among siblings

10/22/2210/22/22 1212

General Technique for General Technique for Querying XML Doc in Querying XML Doc in

RDBMSRDBMS To store and query an XML DocTo store and query an XML Doc

1.1. Relational Schema Generation – table Relational Schema Generation – table creationcreation

2.2. Shredding – storing XML DocShredding – storing XML Doc3.3. Converting queries over stored XML into Converting queries over stored XML into

SQL queries over created tablesSQL queries over created tables Relational schema generation – Relational schema generation –

requires its own query processor to requires its own query processor to convert the queriesconvert the queries

But, the same query processor can be But, the same query processor can be used..used..

10/22/2210/22/22 1313

Contd...Contd... To use the same query processor for To use the same query processor for

relational schema generation and relational schema generation and converting queries,converting queries,

Along with shredding, Along with shredding, Reconstruction XML Reconstruction XML ViewView is created over relational tables is created over relational tables

Virtually reconstructs the Virtually reconstructs the Stored XML Doc <- shredded rows.Stored XML Doc <- shredded rows.

Just like the normal view over the Just like the normal view over the Stored XML Doc.Stored XML Doc.

Queries on Stored XML = Queries over Queries on Stored XML = Queries over Reconstruction XML ViewReconstruction XML View

10/22/2210/22/22 1414

Contd...Contd...

10/22/2210/22/22 1515

Contd...Contd... For Relational Schema Generation, For Relational Schema Generation,

a program thata program that Generated desired relational schemaGenerated desired relational schema Produce XML Shredder object Produce XML Shredder object Create reconstruction XML viewCreate reconstruction XML view

• Either for,Either for, Shared relational schemaShared relational schema Edge relational schemaEdge relational schema

10/22/2210/22/22 1616

Contd...Contd... Shared Relational SchemaShared Relational Schema Steps to generate relational schemaSteps to generate relational schema

Create a DTD Graph node ( XML Element, Create a DTD Graph node ( XML Element, Attribute, Operator)Attribute, Operator)

Create a relation for root element in Create a relation for root element in graphgraph

All children of element are represented All children of element are represented in same relation of element EXCEPT,in same relation of element EXCEPT,• *-node, - is a ‘set’ values + can’t captured *-node, - is a ‘set’ values + can’t captured

by relational expressionsby relational expressions• So, create separate relation for these nodes.So, create separate relation for these nodes.

10/22/2210/22/22 1717

Contd...Contd...

10/22/2210/22/22 1818

Contd...Contd...

10/22/2210/22/22 1919

Contd...Contd...

10/22/2210/22/22 2020

XQL ( XML Query Language XQL ( XML Query Language ))

Structured Queries – relational / OO DBStructured Queries – relational / OO DB Unstructured Queries – DocumentsUnstructured Queries – Documents Semi-structured Queries – XML DocumentsSemi-structured Queries – XML Documents Features like,Features like,

Allows, user to combine information from multiple Allows, user to combine information from multiple sourcessources

Uses links as a part of a queryUses links as a part of a query Search based on text containtmentSearch based on text containtmentEg ) Doc1 – recommended booksEg ) Doc1 – recommended booksDoc 2 – Books + PricesDoc 2 – Books + PricesDoc 3 – Reviews of BooksDoc 3 – Reviews of Books

Then, a query -> list recommended books, prices and Then, a query -> list recommended books, prices and reviews.reviews.

10/22/2210/22/22 2121

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Difference between SQL & XQL QueryDifference between SQL & XQL QuerySQL XQLThe database is a set of tables.

The database is a set of one or more XML documents.

uses the structure of tables as a basic model.

uses the structure of XML documents as a basic model.

The FROM clause determines the tables which are examined by the query.

A query is given a list of input nodes from one or more documents.

The result of a query is a table containing a set of rows; this table may serve as the basis for further queries.

The result of a query is a list of XML document nodes, which may serve as the basis for further queries.

10/22/2210/22/22 2222

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Basic Concepts of XQLBasic Concepts of XQL Simple string – element nameSimple string – element name

• Eg. tableEg. table ‘‘/’ – child operator – indicates hierarchy/’ – child operator – indicates hierarchy

• Eg. Front/authorEg. Front/author ‘‘front/author='Theodore Seuss Geisel'front/author='Theodore Seuss Geisel' front/author/address/@type='email' front//address //address front/author/address[@type='email'] front/author='Theodore Seuss Geisel'[@gender='male' and

shoesize='9EEEE'] section[1,3 to 5, 8, -1] section[@level='3'][1 to 2]

10/22/2210/22/22 2323

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Example QueriesExample Queries

10/22/2210/22/22 2424

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Example QueriesExample Queries

10/22/2210/22/22 2525

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Grouping of resultsGrouping of results QueryQuery – – lists the products on invoices might want to group products by lists the products on invoices might want to group products by

invoice, placing each group of products within an invoice tag.invoice, placing each group of products within an invoice tag.

10/22/2210/22/22 2626

XQL ( XML Query Language XQL ( XML Query Language ) Contd…) Contd…

Join Combine information from multiple sources to

create one unifies view.

Queries can be written like,

10/22/2210/22/22 2727

ConclusionConclusion XML Documents can be stored efficiently in a relational database system using number of approaches.

General Technique for storing and querying XML Document using RDBMS eliminated need of separate query processors for XML query translation.

Using General Technique Reconstruction XML View can be generated for both shared and edge based relational schema.

Stored XML Document can be queried effectively through the use of XQuery, XPath, XSLT or XQL.

10/22/2210/22/22 2828

ReferencesReferences “XML and Relational Database Management Systems : the inside Story” by Michael Rys, Don Chamberlin, & Daniela Florescu.

“A General Technique for Querying XML Documents using a Relational Database System” by Jayavel Shanmugasundaram, Rajasekar Krishnamurthy, Igor Tatarinov.

“Querying and Maintaining Ordered XML Data Using Relational Databases” by Willium SHui, Franky Lam, Damien Fisher & Raymond Wong.

“Querying Structured Text in an XML Database” by Shurung Al-Khalifa, Cong Yu, H.V. Jagdish.

“Structured Materialized Views for XML Queries” by Andrei Arion, Veronique Benzaken & Ioana Manolescu.

10/22/2210/22/22 2929

Thank You.Thank You.

Any Questions ???