Post on 05-Mar-2023
“What happened to …?” Entity-based Timeline
ExtractionTommaso Caselli, Antske Fokkens, Roser Morante and Piek Vossen
{t.caselli,r.morantevallejo,antske.fokkens.piek.vossen}@vu.nl
Outline
• Entity-based Timeline: task definition
• The Timeline module: model and evaluation
• Conclusions and Future Work
Entity-based Timeline• Pilot task (1st edition) at SemEval 2015 Workshop
(Task #4)
• No training data was provided, only a trial dataset of 30 articles
• New annotation specifications based on the NewsReader Guidelines —> existing dataset (TempEval/TimeML) cannot be directly re-used for training
Entity-based Timeline• Task structure and goal:
• corpora of news articles from Wikinews on homogenous topics and entities
• corpora extending through time (documents from 2004 up to 2013)
• retrieve all events whose participants (i.e. subjects or objects) are specific entities (person, organization, (financial) product)
• chronologically order the events
Entity-based Timeline• Not all events may enter a Timeline:
• only events realised by verbs, nouns and pronouns
• events narrating or reporting an event;
• events describing the happening of something
• factual or certain events
• All coreferential mentions of events must be extracted
Timeline module• Timelines are obtained in a two-step approach:
• in-document timeline
• cross-document timeline
• The in-document Timeline module follows on an entity-based model
• The cross-document Timeline module follows an event-based model
Timeline module• Data for in-document Timeline processing are
obtained through:
• NewsReader (NWR) pipeline (D4.2.1 - Agerri et al., 2013;)
• TIPSem (Llorens et al., 2010) (state of the art for event detection, temporal relations and temporal expressions)
Timeline module• Subtasks involved in Timeline
• NER and NED
• Coreference : nominal and eventive
• Event detection and classification
• Event participants (SRL)
• Event factivity
• Temporal Relation extraction and classification (TLINKs)
• Cross-document tasks: entity detection, event coreference and full timeline ordering
Timeline module• Two versions of the module:
Subtask SPINOZA_VU_1 SPINOZA_VU_2
NER & NED NewsReader NewsReader
Coreference NewsReader NewsReader
Event detection and classification
TIPSem NewsReader
Event participants (SRL)
NewsReader NewsReader
Event factivity new module new module
TLINKs TIPSem NewsReader
Cross-Document tasks new module new module
Timeline evaluation• Task A: Timeline extraction from raw texts
• 3 corpora, 90 articles, 37 target entities (13 for corpus1, 11 for corpus2 and 13 for corpus3)
• Task A - Main : evaluation with respect to the correct anchoring and the correct ordering; F1-score as defined in (UzZanam et al., 2013) - temporal awareness
• Task A - Subtrack: evaluation with respect to the correct ordering; F1-score as defined in (UzZanam et al., 2013) - temporal awareness
Timeline evaluation• Error analysis has shown:
• concatenation of errors from the pipeline
• issues in entity detection —> only 31 entities out 37 were extracted
• issues in event detection: few per entity (SPINOZA_VU_1 average F1 23.58; SPINOZA_VU_2 average F1 20.46) - lower scores in the Subtrack support this analysis.
Conclusions and Future Work
• Timeline extraction is a difficult task —> best system for in-document TLINK identification and classification F1 30.90 (TempEval-3)
• Need to improve the event detection module and cross-document event coreference
• Need to identify relevant/salient events per entity
• Shift from Timeline to Storyline
https://sites.google.com/site/computingnewsstorylines2015/home
Deadline for submissions: 14 May 2015!Worskhop: 31 July 2015 (Beijing)
“Working” workshop : shared data to play with!!