Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon...

Post on 20-Dec-2015

226 views 0 download

Transcript of Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon...

ONTOLOGY LEARNING

A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES

INSTITUTE

Carnegie Mellon University’s School of Computer Science: 1 undergraduate program 7 graduate departments (CSD, HCI, LTI, RI, SEI,MLD,ETC)The Language Technologies Institute is a graduate department

in the School of Computer Science About 25 faculty About 125 graduate students (~85 PhD, ~40MS) About 30 visitors, post-docs, staff programmers, …

© 2005 JAMIE CALLAN

2

A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES

INSTITUTE

LTI courses and research focus on Machine translation, especially high-accuracy MT Natural language processing & computational linguistics Information retrieval & text mining Speech recognition & synthesis Computer-assisted language learning & intelligent tutoring Computational biology (“the language of the human genome”) … and combinations of the above

Speech-to-speech MT Open domain question answering …

3

A LITTLE BIT ABOUT ME

My Research Interests Text Mining Information Retrieval Natural Language Processing Statistical Machine Learning

My Earlier Work Question Answering Multimedia Information Retrieval Near-duplicate Detection Opinion Detection

TODAY’S TALK

ONTOLOGY LEARNING

ROADMAP

Introduction

Subtasks in Ontology Learning

Human-Guided Ontology Learning

User Study

Metric-Based Ontology Learning

Experimental Results

Conclusions

INFORMATION RETRIEVAL

TECHNOLOGIES

Web Search Engines have changed our life

Google’s great achievement

But, have Search Engines fulfilled Information Needs?

Some, only some

What does search bring to us?

Overwhelming Information in Search Results

Tedious manual judgment still needed

FIND A GOOD KINDERGARTEN IN THE

PITTSBURGH AREA

BUY A USED CAR IN THE PITTSBURGH AREA

IT WILL BE GREAT TO HAVE

A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials

THIS IS EXACTLY THE TASK OF INFORMATION TRIAGE, OR PERSONAL ONTOLOGY LEARNING

INTRODUCTION

Ontology is a data model that represents a set of concepts within a domain and the set of pair-wise relationships between those concepts.

EXAMPLE: A SIMPLE ONTOLOGY

ball table

Game Equipment

EXAMPLE: WORDNET

EXAMPLE: ODP

ODP

INTRODUCTION

Ontology Learning is the task to construct a well-defined ontology given

a text corpus or

a set of concept terms

INTRODUCTION

Ontology offers a nice way to summarize the important topics in a domain/collection

Ontology facilitates knowledge sharing and reuse

Ontology offers relational associations for reasoning and inference

ROADMAP

Introduction

Subtasks in Ontology Learning

Human-Guided Ontology Learning

User Study

Metric-Based Ontology Learning

Experimental Results

Conclusions

SUBTASKS IN ONTOLOGY LEARNING

Concept Extraction

Synonym Detection

Relationship Formulation by Clustering

Cluster Labeling

SUBTASKS IN ONTOLOGY LEARNING

Concept Extraction

Synonym Detection

Relationship Formulation by Clustering

Cluster Labeling

CONCEPT EXTRACTION

Two Steps:

Noun N-gram and Named Entity Mining

Web-based Concept Filtering

NOUN N-GRAM MINING

I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.

NOUN N-GRAM MINING

I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.

Extracted Bi-grams

Mercury emissions

Power plants

Concept Filtering

Web-based POS error detection Assumption:

Among the first 10 google snippets, a valid concept appears more than a threshold (4 in our case)

Remove POS errors protect/NN polar/NN bear/NN

Remove Spelling errors Pullution, polor bear

CONCEPT EXTRACTION

SUBTASKS IN ONTOLOGY LEARNING

Concept Extraction

Synonym Detection

Relationship Formulation by Clustering

Cluster Labeling

CLUSTERING

Hierarchical Clustering

Different Strategies for Concepts at Different Abstraction Levels

EXAMPLE: A SIMPLE ONTOLOGY

ball table

Game Equipment

Abstract Level

Concrete Level

BOTTOM-UP HIERARCHICAL

CLUSTERING

Concept candidates are organized into groups based on the 1st sense of the head noun in WordNet

One of their common head nouns will be selected as the parent concept for this group

pollution subsumes water pollution, air pollution.

Create a high accuracy concept forests at the lower level of the ontology

Start from Concrete Concepts

ONTOLOGY FRAGMENTS

Different fragments are

grouped

CONTINUE TO BE BOTTOM-UP

Problem Still a forest Many concepts at top level are not

grouped Solution: Clustering In any clustering algorithm, we need

a metric Hard to know the metric to

measure distance for those top level nodes

HUMAN-GUIDED ONTOLOGY LEARNING

Learn What? A distance metric function

Learn from What? Concepts at lower levels since they are highly

accurate User feedback

After learning, then what? Apply the distance metric function to concepts at

the higher level to get distance scores for them Then use whatever clustering algorithm to group

them based on the distance scores

TRAINING DATA FROM LOWER LEVELS

A set of concepts x(i) on the ith level of the ontology hierarchy

Distance matrix y(i)

The Matrix entry which corresponding to concept x(i)

j and x(i)k is y(i)

jk∈{0,1},

y(i)jk = 0, if x(i)

j and x(i)k in the same group;

y(i)jk = 1, otherwise.

TRAINING DATA FROM LOWER LEVELS

0011

0011

1100

1100

y(i)

LEARNING THE DISTANCE METRIC

Distance metric represented as a Mahalanobis distance

Φ(x(i)j, x(i)

k)represents a set of pairwise underlying feature functions

A is a positive semi-definite matrix, the parameter we need to learn

Parameter estimation by Minimizing Squared Errors

SOLVE THE OPTIMIZATION

PROBLEM

Optimization can be done by

Newton’s Method

Interior-Point Method

Any standard semi-definite programming (SDP) solvers Sedumi, yalmip

GENERATE DISTANCE SCORES

We have learned A!

For any pair of concepts at higher level (x(i+1)l,

x(i+1)m), the corresponding entry in the distance

matrix y(i+1) is

K-MEDOIDS CLUSTERING

Flat clustering at a level

Use one of the concepts as the cluster center

Estimate the number of clusters by Gap statistics [Tibshirani et al. 2000]

HUMAN COMPUTER INTERACTION

SUBTASKS IN ONTOLOGY LEARNING

Concept Extraction

Synonym Detection

Relationship Formulation by Clustering

Cluster Labeling

CLUSTER LABELING

Problem: Concepts are grouped together,

but nameless Solution:

A web-based approach Send a query formed by

concatenating the child concepts to Google

Parse top 10 snippets The most frequent word is

selected to be the parent of this group

ROADMAP

Introduction

Subtasks in Ontology Learning

Human-Guided Ontology Learning

User Study

Metric-Based Ontology Learning

Experimental Results

Conclusions

USER STUDY

12 graduate students from political science in University of Pittsburgh

Divided into two groups Manual group:4 Interactive group:8

Task: Construct ontology hierarchy for 4 datasets

Mercury Polar bear Wolf Toxin Release Inventory (tri)

90 minutes limit or till user’s satisfaction

DATASETS

SOFTWARE USED FOR USER STUDY

QUALITY OF MANUAL VS. INTERACTIVE RUNS

Manual Users show moderate agreements (0.4~0.6)

Interactive runs produce similar quality results

Difference between manual and interactive runs is NOT statistically significant

COSTS OF MANUAL VS. INTERACTIVE RUNS

Interactive users use 40% less edits (statistically significant)

Interactive runs save 30-60 minutes per ontology

Within interactive runs, a human spends 64% less time than manual runs

CONTRIBUTIONS OF HUMAN-GUIDED ONTOLOGY

LEARNING

Effectively combine the strengths of automatic systems and human knowledge

Combine many techniques into a unified framework pattern-based(concept mining) knowledge-based (use of Wordnet) Web-based (concept filtering and cluster naming) Machine Learning

A detailed independent user study

WHAT TO IMPROVE?

Is bottom-up the best way to do? Maybe not Incremental clustering saves most efforts

We have used different technologies for concepts at different levels, how to formally generalize it? Model concept abstractness explicitly

We have tested on domain-specific corpora, how about corpora for more general purposes? Can we reconstruct WordNet or ODP?

ROADMAP

Introduction

Subtasks in Ontology Learning

Human-Guided Ontology Learning

User Study

Metric-Based Ontology Learning

Experimental Results

Conclusions

CHALLENGES

Hard to find a good name for a new group in bottom-up clustering framework

Formally model concept abstractness Intelligently use different techniques for concepts

at different abstract levels Flexibly incorporate heterogeneous features

State-of-the-art either use one type of semantic evidence to infer all relationships, or

Use one type of feature for a particular subtask

CHALLENGES

Hard to find a good name for a new group in bottom-up clustering framework

Formally model concept abstractness Intelligently use different techniques for concepts

at different abstract levels Flexibly incorporate heterogeneous features

State-of-the-art either use one type of semantic evidence to infer all relationships, or

Use one type of feature for a particular subtask

Solution:Increment

al clustering

CHALLENGES

Hard to find a good name for a new group in bottom-up clustering framework

Formally model concept abstractness Intelligently use different techniques for concepts

at different abstract levels Flexibly incorporate heterogeneous features

State-of-the-art either use one type of semantic evidence to infer all relationships, or

Use one type of feature for a particular subtask

Solution:Learn statistical models for each abstraction level

CHALLENGES

Hard to find a good name for a new group in bottom-up clustering framework

Formally model concept abstractness Intelligently use different techniques for concepts

at different abstract levels Flexibly incorporate heterogeneous features

State-of-the-art either use one type of semantic evidence to infer all relationships, or

Use one type of feature for a particular subtask

Solution:Separate metric learning and ontology construction

A UNIFIED SOLUTION

Metric-based Ontology Learning

LET’S BEGIN WITH SOME IMPORTANT

DEFINITIONS

An ontology is a data model

Concept Set Relationship Set Domain

MORE DEFINITIONS

ball table

Game Equipment

A Full Ontology

MORE DEFINITIONS

ball

Game Equipment

A Partial Ontology

table

MORE DEFINITIONSOntology

Metric

weight = 1.5 weight= 2

weight=1weight =1

d( , ) = 2

d( , ) = 1 ball

d( , ) = 4.5 table

MORE DEFINITIONS

weight= 1.5 weight= 2

weight=1weight=1

d( , ) = 2

d( , ) = 1 ball

d( , ) = 4.5 table

Information in an

Ontology T

MORE DEFINITIONS

d( , ) = 2

d( , ) = 1 ball

d( , ) = 1

Information in a Level L

ball

ASSUMPTIONS OF ONTOLOGY

Minimum Evolution Assumption: The

Optimal Ontology is the One Introduces the Least Information

Changes!

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ball

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption ball

table

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ball table

Game Equipment

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ball table

Game Equipment

ASSUMPTIONS OF ONTOLOGYMinimum

Evolution Assumption

ball table

Game Equipment

ASSUMPTIONS OF ONTOLOGYAbstractness

Assumption: Each abstraction level

has its own Information

function

ASSUMPTIONS OF ONTOLOGYAbstractness

Assumption

ball table

Game Equipment

FORMAL FORMULATION OF ONTOLOGY

LEARNING

The Task of Ontology Learning is defined as

The construction of a full ontology T given a set of concepts C and an initial partial ontology T0

Keeping adding concepts in C into T0

Note T0 could be empty

Until a full ontology is formed

GOAL OF ONTOLOGY LEARNING

Find the optimal full ontology s.t. the information changes since T0 are least , i.e.,

Note that this is by the Minimum Evolution Assumption

GET TO THE GOAL

Goal:

Since the optimal set of concepts is always C

Concepts are added incrementally

GET TO THE GOAL

Plug in definition of information change

Transform into a minimization problemMinimum

Evolution objective function

EXPLICITLY MODEL ABSTRACTNESS

Model Abstractness for each Level by Least Square Fit

Plug in definition of amount of information for an abstraction level

Abstractnessobjective function

MULTIPLE CRITERION OPTIMIZATION

FUNCTIONMinimum Evolution

objective function

Abstractnessobjective function

Scalarization variable

THE OPTIMIZATION ALGORITHM

ESTIMATING ONTOLOGY METRIC

Assume ontology metric is a linear interpolation of some underlying feature functions

Ridge Regression to estimate and predict the ontology metric

FEATURES

Google KL-Divergence

Wikipedia KL-Divergence

Google Minipar Syntactic distance

Lexico-Syntactic Patterns

Term Co-occurrence

Word Length Difference

EVALUATION

Reconstruct subdirectories from WordNet and ODP

50 WordNet subdirectories are from 12 topics: gathering, professional, people, building, place, milk, meal, water, beverage, alcohol, dish and herb

50 ODP subdirectories are from 16 topics: computers, robotics, intranet, mobile computing, database, operating system, linux, tex, software, computer science, data communication, algorithms, data formats, security multimedia and artificial intelligence

DATASETS

ONTOLOGY RECONSTRUCTION

An absolute gain of 10% compared to the state-of-the-art system developed in Stanford University (ACL2006 Best Paper)

INTERACTION OF ABSTRACTION LEVELS

AND FEATURES

Abstract concepts are sensitive to the explicit modeling – a good modeling on abstract concepts greatly boost performance

Contributions from different features for abstract concepts vary; for concrete concepts indifferent

Simple features (term co-occurrence, word length) work the best

Combination of heterogeneous features works better than individual features

CONTRIBUTIONS OF METRIC-BASED

ONTOLOGY LEARNING

Avoid and hence solve the problem of unknown group names

Tackle the problem of no control over concept abstractness

Experiments show that concept at different abstraction levels behave different and sensitive to different features

Provide a solution to incorporate heterogeneous features

An absolute gain of 10% in precision for both WordNet and ODP than a state-of-the-art system

WE HAVE TALKED ABOUT

The Task of Information Triage and Personal Ontology Learning

Human-guided Ontology Learning

Metric-based Ontology Learning

AT THE BEGINNING, WE SAID:

IT WILL BE GREAT TO HAVE

A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials

FIND A GOOD KINDERGARTEN IN THE

PITTSBURGH AREA

Are We there yet?

THE KINDERGARTEN EXAMPLE

We Have DONE with the organization! However, does it support further

inference for a decent decision making? Maybe not Future work!

More Future work Model multiple relationships

simultaneously More Efficient Distance Metric

Learning

THANK YOU AND QUESTIONS