"What happened to …?" Entity-based Timeline Extraction

17
What happened to …?Entity-based Timeline Extraction Tommaso Caselli, Antske Fokkens, Roser Morante and Piek Vossen {t.caselli,r.morantevallejo,antske.fokkens.piek.vossen}@vu.nl

Transcript of "What happened to …?" Entity-based Timeline Extraction

“What happened to …?” Entity-based Timeline

ExtractionTommaso Caselli, Antske Fokkens, Roser Morante and Piek Vossen

{t.caselli,r.morantevallejo,antske.fokkens.piek.vossen}@vu.nl

Outline

• Entity-based Timeline: task definition

• The Timeline module: model and evaluation

• Conclusions and Future Work

Entity-based Timeline• Pilot task (1st edition) at SemEval 2015 Workshop

(Task #4)

• No training data was provided, only a trial dataset of 30 articles

• New annotation specifications based on the NewsReader Guidelines —> existing dataset (TempEval/TimeML) cannot be directly re-used for training

Entity-based Timeline• Task structure and goal:

• corpora of news articles from Wikinews on homogenous topics and entities

• corpora extending through time (documents from 2004 up to 2013)

• retrieve all events whose participants (i.e. subjects or objects) are specific entities (person, organization, (financial) product)

• chronologically order the events

Entity-based Timeline• Not all events may enter a Timeline:

• only events realised by verbs, nouns and pronouns

• events narrating or reporting an event;

• events describing the happening of something

• factual or certain events

• All coreferential mentions of events must be extracted

events

target entity

event ordering

event anchoring

Timeline module• Timelines are obtained in a two-step approach:

• in-document timeline

• cross-document timeline

• The in-document Timeline module follows on an entity-based model

• The cross-document Timeline module follows an event-based model

Timeline module• Data for in-document Timeline processing are

obtained through:

• NewsReader (NWR) pipeline (D4.2.1 - Agerri et al., 2013;)

• TIPSem (Llorens et al., 2010) (state of the art for event detection, temporal relations and temporal expressions)

Timeline module• Subtasks involved in Timeline

• NER and NED

• Coreference : nominal and eventive

• Event detection and classification

• Event participants (SRL)

• Event factivity

• Temporal Relation extraction and classification (TLINKs)

• Cross-document tasks: entity detection, event coreference and full timeline ordering

Timeline module• Two versions of the module:

Subtask SPINOZA_VU_1 SPINOZA_VU_2

NER & NED NewsReader NewsReader

Coreference NewsReader NewsReader

Event detection and classification

TIPSem NewsReader

Event participants (SRL)

NewsReader NewsReader

Event factivity new module new module

TLINKs TIPSem NewsReader

Cross-Document tasks new module new module

Timeline evaluation• Task A: Timeline extraction from raw texts

• 3 corpora, 90 articles, 37 target entities (13 for corpus1, 11 for corpus2 and 13 for corpus3)

• Task A - Main : evaluation with respect to the correct anchoring and the correct ordering; F1-score as defined in (UzZanam et al., 2013) - temporal awareness

• Task A - Subtrack: evaluation with respect to the correct ordering; F1-score as defined in (UzZanam et al., 2013) - temporal awareness

Timeline evaluation• Task A - Main

!

!

!

!

!

• Task A - Subtrack

Timeline evaluation• Error analysis has shown:

• concatenation of errors from the pipeline

• issues in entity detection —> only 31 entities out 37 were extracted

• issues in event detection: few per entity (SPINOZA_VU_1 average F1 23.58; SPINOZA_VU_2 average F1 20.46) - lower scores in the Subtrack support this analysis.

Conclusions and Future Work

• Timeline extraction is a difficult task —> best system for in-document TLINK identification and classification F1 30.90 (TempEval-3)

• Need to improve the event detection module and cross-document event coreference

• Need to identify relevant/salient events per entity

• Shift from Timeline to Storyline

https://sites.google.com/site/computingnewsstorylines2015/home

Deadline for submissions: 14 May 2015!Worskhop: 31 July 2015 (Beijing)

“Working” workshop : shared data to play with!!

Thank You!! !

Questions?