The Hindi/Urdu Treebank - University of Colorado Boulder

105
10/12/12 1 The Hindi/Urdu Treebank: New Frontiers in Hindi and Urdu Natural Language Processing Dip) Misra Sharma LTRC, IIIT, Hyderabad, India, dip)@iiit.ac.in Owen Rambow CCLS, Columbia, New York City, USA, [email protected] Ashwini Vaidya Linguis)cs, University of Colorado, Boulder, USA, [email protected] Dec 8, 2012 COLING 2012 Overview Introduc)on to the nature of syntac)c representa)ons. (Rambow, 15 minutes) Introduc)on to the morphology, syntax, and lexical seman)cs of Hindi and Urdu. (Sharma, 40 minutes) The morphological representa)on for Hindi and Urdu, including encoding issues, tokeniza)on, partXofXspeech tags, and morphological representa)on. (Sharma and Rambow, 20 minutes) The dependency representa)on (DS) for Hindi and Urdu syntax: principles, representa)on, and examples. (Sharma, 25 minutes) The lexical seman)c representa)on (PB) for Hindi and Urdu: principles, representa)on, and examples. (Vaidya, 25 minutes) The phrase structure representa)on (PS) for Hindi and Urdu syntax: principles, representa)on, and examples. (Rambow, 25 minutes) Sample ini)al experiments in Hindi and Urdu NLP using the HUTB. (Sharma and Rambow, 15 minutes).

Transcript of The Hindi/Urdu Treebank - University of Colorado Boulder

101212

1

The HindiUrdu Treebank New Frontiers in Hindi and Urdu

Natural Language Processing

Dip)MisraSharmaLTRCIIITHyderabadIndiadip)iiitacin

OwenRambow

CCLSColumbiaNewYorkCityUSArambowcclscolumbiaedu

AshwiniVaidyaLinguis)csUniversityofColoradoBoulderUSAAshwiniVaidyacoloradoedu

Dec82012

COLING2012

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

2

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

TheHindiTreebank

bull  3Representa)onsndash DSDependencyStructurendash PBPropBank(lexicalpredicateXargumentstructure)ndash PSPhraseStructure

bull  Whyhavethreelevelsofrepresenta)onWhatdoesldquolevelofrepresenta)onrdquomeaninfact

101212

3

WhatisaSyntac)cRepresenta)on

1  Syntac)cphenomena(ldquowhatrdquo)egndash  Subjectofaverbndash  Rela)veclausendash  SmallclauseLinguiststendtoagreeonwhatphenomenaexist

2  Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash  Phrasestructuretreendash  Dependencytreendash  OrsomethingmorecomplicatedgraphLFGTAGhellip

3  Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a  Mappingfromphenomenatorepresenta)ons(inpar)culartype)b  Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc  Phenomenaextractedinrepresenta)onaretheinterpretaond  Formaldescrip)onisasyntacctheoryifitmakespredic)ons

Representa)onTypesDependencyandPhraseStructure

bull  DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings

bull  PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals

ndash Allandonlyinteriornodesarelabeledwithnonterminals

ndash Leavesarelabeledwithterminalsoremptystringsbull  Nothingelseispartofthedefini)on

101212

4

ExampleSmallClauses

bull  Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash  lsquoA)fconsideredSeemastupidrsquo

bull  Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid

WhatisthePhenomenon

bull  Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]

bull  Buttwoproblemsndash Noverbndash  herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-

bull  Sondash A)fconsidered[smallclauseherstupid]

101212

5

WhatistheRepresenta)onType

bull  Forthisexamplewewillshowdependencytreesandphrasestructuretrees

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

TheHindiTreebank

bull  3Representa)onsndash DSDependencyStructurendash PBPropBank(lexicalpredicateXargumentstructure)ndash PSPhraseStructure

bull  Whyhavethreelevelsofrepresenta)onWhatdoesldquolevelofrepresenta)onrdquomeaninfact

101212

3

WhatisaSyntac)cRepresenta)on

1  Syntac)cphenomena(ldquowhatrdquo)egndash  Subjectofaverbndash  Rela)veclausendash  SmallclauseLinguiststendtoagreeonwhatphenomenaexist

2  Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash  Phrasestructuretreendash  Dependencytreendash  OrsomethingmorecomplicatedgraphLFGTAGhellip

3  Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a  Mappingfromphenomenatorepresenta)ons(inpar)culartype)b  Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc  Phenomenaextractedinrepresenta)onaretheinterpretaond  Formaldescrip)onisasyntacctheoryifitmakespredic)ons

Representa)onTypesDependencyandPhraseStructure

bull  DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings

bull  PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals

ndash Allandonlyinteriornodesarelabeledwithnonterminals

ndash Leavesarelabeledwithterminalsoremptystringsbull  Nothingelseispartofthedefini)on

101212

4

ExampleSmallClauses

bull  Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash  lsquoA)fconsideredSeemastupidrsquo

bull  Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid

WhatisthePhenomenon

bull  Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]

bull  Buttwoproblemsndash Noverbndash  herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-

bull  Sondash A)fconsidered[smallclauseherstupid]

101212

5

WhatistheRepresenta)onType

bull  Forthisexamplewewillshowdependencytreesandphrasestructuretrees

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

WhatisaSyntac)cRepresenta)on

1  Syntac)cphenomena(ldquowhatrdquo)egndash  Subjectofaverbndash  Rela)veclausendash  SmallclauseLinguiststendtoagreeonwhatphenomenaexist

2  Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash  Phrasestructuretreendash  Dependencytreendash  OrsomethingmorecomplicatedgraphLFGTAGhellip

3  Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a  Mappingfromphenomenatorepresenta)ons(inpar)culartype)b  Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc  Phenomenaextractedinrepresenta)onaretheinterpretaond  Formaldescrip)onisasyntacctheoryifitmakespredic)ons

Representa)onTypesDependencyandPhraseStructure

bull  DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings

bull  PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals

ndash Allandonlyinteriornodesarelabeledwithnonterminals

ndash Leavesarelabeledwithterminalsoremptystringsbull  Nothingelseispartofthedefini)on

101212

4

ExampleSmallClauses

bull  Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash  lsquoA)fconsideredSeemastupidrsquo

bull  Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid

WhatisthePhenomenon

bull  Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]

bull  Buttwoproblemsndash Noverbndash  herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-

bull  Sondash A)fconsidered[smallclauseherstupid]

101212

5

WhatistheRepresenta)onType

bull  Forthisexamplewewillshowdependencytreesandphrasestructuretrees

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

ExampleSmallClauses

bull  Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash  lsquoA)fconsideredSeemastupidrsquo

bull  Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid

WhatisthePhenomenon

bull  Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]

bull  Buttwoproblemsndash Noverbndash  herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-

bull  Sondash A)fconsidered[smallclauseherstupid]

101212

5

WhatistheRepresenta)onType

bull  Forthisexamplewewillshowdependencytreesandphrasestructuretrees

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

WhatistheRepresenta)onType

bull  Forthisexamplewewillshowdependencytreesandphrasestructuretrees

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

6

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-

considers

A)f stupid

Subj ObjXECM

her

Subj

Analysis1aforSmallClausesNoAccusa)veCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj Obj

her

Subj

S

NP

A)f

VP

considers S

her VP

AdjPstupid

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

7

Analysis1bforSmallClausesExcep)onalCaseMarking

bull  Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-

considers

A)f stupid

Subj ObjXECM

her

Subj

S

NP

A)f

VP

considers SC

her VP

AdjPstupidClosetoanalysisadoptedinChomsky(1981)

NoteonDSandPS

bull  Theseanalysesareintui)velyverysimilarbull  Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash  Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

8

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

Obj

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

Subj ObjPred

her

Obj

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

9

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2

NeoXPaniniananalysis

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

समझा

अातफamp वकफ

k1 k2s

सीमाको

k2

NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

10

Analysis2aforSmallClausesGeneralMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject

considers

A)f stupid

Subj Obj2

her

ObjNP

A)f

VP

considers her AdjP

stupid

S

Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis

bull  Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel

considers

A)f stupid

k1 k2s

her

k2NP

A)f

VP

considers her

AdjP

stupid

S

SC

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

11

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

Analysis3forSmallClausesRaisingtoObject

bull  Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders

A)f stupid

Subj ObjXPred

her1

Obj

e1

Subj

S

NP

A)f

VP

considers S

VP

AdjPstupid

her1

e1

AnalysisusedforPSinHindiXUrduTreebank

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

12

ComparisonofRepresenta)ons

bull  LessInforma)on bull  Sameinforma)on

considers

A)f stupid

Subj Obj

her

Subj

considers

A)f stupid

Subj Obj2

her

Obj

considers

A)f stupid

Subj ObjXPred

her1

Obj

e1Subj

considers

A)f stupid

Subj ObjPred

her

Obj

considers

A)f stupid

Subj ObjXECM

her

Subj

Tree1a

Tree2a

Tree1b

Tree2b

Tree3

SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses

bull  Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages

bull  Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently

bull  TheanalysescanbesimilarinDSandPSbull  Lotsofchoicesintreebankdesign

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

13

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)cdependencycanbeencodedinPSandtypicallyis

bull  Usualconven)onaoachmentinprojec)onshowstypeofdependency

ArenrsquotDSandPSRepresenta)onsComplementaryNO

bull  Syntac)ccons)tuencyisrepresentedinDSbull  Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

14

WhatDoesThisMeanforNLP

bull  Treebanksarenotnaturallyoccurringdatabull  Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage

bull  Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)

bull  Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent

bull  Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher

bull  Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on

TheHindiTreebank

bull  DSdependencyannotatedbyhandbull  PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes

bull  PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

15

ComparisonofDSPBPS(Sample)

DS PB PS

How Dependency

PhraseStructure

What Dis)nguishunerga)veunaccusa)ve

Dis)nguishtemporalloca)veadjuncts

Dis)nguishunaccusa)vetransi)vewithemptyagent

Overview

bull  Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull  Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues

tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)

bull  Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)

bull  Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

1

Introduction to Morphology Syntax and

Lexical Semantics of Hindi and Urdu

Dipti Misra Sharma ltdiptiiiitacingt

LTRC IIIT Hyderabad India

Dec 8 2012

COLING 2012

Outline

  Introduction  Some facts about Hindi and Urdu

  Linguistic properties  Morphology  Some basic Syntax  Lexical semantics

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

HindiSome facts

  A major language of Indo-Aryan family   Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh   Also spoken outside India in Fiji Mauritius Guyana etc   Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report)   A large population in India who speak Hindi as their second language   Script Devanagari ndash a syllabic script

Urdu Some facts

  An Indo-Aryan language   Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi   Significant borrowings from Arabic and Persian   It was also known as rekhta (mixed language)   Official language of Pakistan   Official language of states of India   Also spoken in Fiji Bangladesh etc   Number of speakers in India 51536111 (501) (2001 Census of India report)   Script Perso-Arabic

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

Hindi-Urdu (Hindustani)

  Hindi and Urdu are mutually intellible   Linguists consider them as two registers of the

same language   Similar in grammatical structures   Differ in vocabulary particularly in the formal

written varieties   A mixed variety of the two is used as a lingua

franca in India and is also known as Hindustani

Some Basic characteristics of HindiUrdu

  HindiUrdu have relatively free word order

  The unmarked word order in both the languages is subject-object-verb

(SOV)

  Auxiliary verbs follow the main verb

  Nouns are followed by postpositions

  Adjectives precede the nouns they modify

  In Urdu sometimes adjectives follow the noun (ezafe constructions)

  Large use of participles complex predicates and causatives

  Reduplication and echo-compounding are productively used in Hindi

Urdu (in fact almost all the Indian languages)

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

Morphology

Hindi and Urdu have following morpholgical properties   Grammatical gender masculine and feminine   Number singular and plural   Person first second and third   Case direct oblique and vocative   Adjectives inflect for number gender and case

ndash Some adjectives do not decline

Nouns   Nouns in HindiUrdu are inflected for number and case

  Gender All nouns have inherent gender pankhaa (fanmasc) lataa

(creeperfem) ghar (housemasc)

  Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses)   Case

The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

Case   Direct nouns are in nominative and are not followed by a postposition

  Occur denoting subject andor object

LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt

laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt

  Oblique nouns are objects of a postposition such as ne (erg) ko

(accdative) se (instr) meM (loc) par (loc) and kaa (gen)

laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt

laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

6

Pronouns

Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and

kuch (some)

Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)

Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)

kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some

peopleindef)

Pronouns (Contdhellip)

  Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple

maiMne (Ierg) and tuune (youerg)

  Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)

  Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form

hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)

  maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

7

Adjectives   Morphologically an adjective is inflected for gender number and case as it agrees with the following noun

  Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo

  The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below

Case rarr Ditect Oblique

NumberrarrGender darr

Sg Pl Sg Pl

Masc acchaa acche acche acche

Fem acchii acchii acchii acchii

Verbs   Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the

agreement features of gender number and person   Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu

Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu

Root ro cry

Infinitive ronaa to cry

Habitual rotaa cryhab

Perfective royaa cried

Causative rulaa cause someone to cry

rulvaa make someone to cause someone to cry

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

8

Auxiliaries   Auxiliaries mark Tense Aspect and Modality information on

verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres

The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres

The children can continue to have their meal

  Auxiliaries also carry the gender number and person information

Postpositions

  Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst   Hindi also has compound postpositions

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

9

Compound Post-positions

Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows

ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo

Urdu Specific Features

Prepositions in Urdu ezafe in Urdu

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

10

Urdu has Prepositions   Unlike Hindi Urdu has prepositions as well   Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time

ezafe in Urdu

  Urdu has what is referred to as ezafe   Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India

EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

11

Reduplication A morphological processes

  Words belonging to various categories can be reduplicated   These expressions are often hyphenated   Reduplication has various morphological functions depending on the

lexical category which is reduplicated For example

  Nouns it adds the sense of every   Verbs it brings the sense of adverbial participle   Adjectives and adverbs it adds intensity

  Hindi has three types of reduplication full partial and redundant   Reduplication is highly productive in these languages

Full Reduplication

If the word is X then its reduplicated form is X-X

raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

12

Partial Reduplication (Echo words)   In partial reduplication an expression X is repeated partially   Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo   In HindiUrdu The first consonant of X is replaced by v-

For example khaanaa-vaanaa lsquofood-etcrsquo

  vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo

Some Basic Syntax

  Hindi and Urdu are both relatively free word order SOV languages

  For case marking Hindi primarily uses postpositions   The verb agrees either with subject or with object   Adjective agrees with the noun it modifies

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

13

Simple Transitive

trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book

trans-2 आतफ़ कताब पढ़ी

Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book

trans-3 आतफ़ को कताब पढ़नी पड़ी

Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book

Intransitive Unergative

Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

unerg-2 आतफ़ सोया

Atif ne soyaa Atif erg sleepmsgpst Atif slept

unerg-3 आतफ़ को सोना पड़गा

Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

14

Intransitive Unaccusative

unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open

unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened

unacc-3 दरवाज़ को ख9लना पड़गा

darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open

Existential

exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

15

Dative Subject

unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds

dat-subj-1 कल रात बादलD म9झको चाEद दखा

kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds

Ditransitive

ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan

ditrans-2 राम मोहन को कताब दी

raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

16

Complement Clause

compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut

Ram knows that Sita will arrive late

Relative Clause

rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come

rahii hai progfsg besgpres

My sister who stays in Delhi is coming tomorrow

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

17

Relative Clause

rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me

rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी

maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst

I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली

jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me

Complex Predicate

compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi

compl-pred-2 राम रव को याद कर रहा था

raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

18

Causatives Unerg-1 आतफ़ सोएगा

Atif soyegaa Atif sleepmsgfut Atif will sleep

causative-1 आया आतफ़ को स9लाया

aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo

causative-2 माE आया K आतफ़ को स9लवाया

maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo

Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example   Experiencer verbs

  The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres   Participatory verbs

  The second argument of the participatory verbs takes se postposition

raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

19

References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge

London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language

Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University

Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Representing Tokens Morph Analysis

POS and Chunks in The HindiUrdu Treebanks

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

Outline

 Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies

Tokeniza3on Automa3c Issues

 Compounds Punctua3onsForexample

usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

Tokenization

Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7

Tokeniza3onIssues

 Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)

  Compoundsinternallycontainapunctua3on  Areproduc3ve  Morphologicalanalysisofthemembersofthecompounds  Theissuewhethertocreateasingletoken  Decision  Createthreetokens  MarkthehyphenasJOIN

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

MorphAnalysisanditsRepresenta3on

af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix

ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt

POS$Tagging$$

 ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

Chunking$$

  Chunking is introduced to save the effort in manual tagging   Dependency relations are marked between the chunk heads   Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))

Rambow

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

Paninian Grammatical Model and

HindiUrdu Treebanks

Dipti Misra Sharma IIIT Hyderabad

ltdiptiiiitacingt

COLING-2012

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

Outline(

 PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks

  Somebasicconcepts SomeHindiconstruc3ons

  Causa3ves  CoLordina3on Unaccusa3ves  Rela3veclauses

 Conclusions

Introducon(

 TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

HindiDependencyTreebank

TheCorpus

 Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k

 DependencygrammarframeworkPaninianGramma3calmodel

WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

Paninis(Grammar((

 Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on

PaninisGrammarcontd

 TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

Sabinaopenedthelock

opened

k1 k2

Sabina lock

the

K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result

Sabina opened the lock with this key

opened

sabina lock key

the this

k1 k2

k3

K3 (karaNa) instrument

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

6

Yesterday Sabina opened the lock with this key at my home

opened

Yesterday Sabina lock key home

the this my

k7t k1 k2 k3

k7p

K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place

Yesterday the lock opened with this key

opened

yesterday lock key

the this

k7t k1

k3

lock becomes the karta

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

7

LevelsofAnalysis

L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm

OurModel

 Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads

 Marktherela3onsacrosschunks(headtoheadrela3on)rlm

 ChunkLinternaldependenciesarelelunspecified

 Thetreesarefullyexpandedautoma3cally

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

8

ForExample

meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN

bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG

ExampleContd

((meraa_PRPbaDzaa_JJbhaaii_NN))_NP

((bahuta_QFphala_NN))_NP

((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

9

KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened

Seman3csoftheverb

 Averbalrootdenotes$  Theac3vity$  Theresult

  Locusofac3vitykarta  Locusofresultkarma

Verbal(Root(

acvity( result(

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

10

kartaLkarma

  Theboyopenedthelock$  k1ndashkarta$  k2ndashkarma

  kartakarmasome3mescorrespondtoagenttheme$  NotalwaysThedooropened$  Thedooriskarta$  Thesentencehasnoexplicitkarma

(open(

boy( lock(

k1 k2

SubLac3onsLOpeningoflock

Openingoflock

Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)

(ac3on2)

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

11

SubLac3onsLOpeningoflock

open(

boy( lock( key(

k1k2 k3

open(

open(

lock(

lock(key(

k1

k1 k2

k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm

Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

12

SpeakerrsquosInten3on(vivakshaa)rlm

  Everysentencereflectsspeakerrsquosinten3on$  Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$  lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress

  Syntaxreflectsvivaksha

The Scheme

 Morph analysis  POS tagging  Chunking   Mark the syntactic relations (dependency relations) across

chunks (head to head relation) rlm

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

13

Overview

  Objective   The Scheme

$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations

  Dependency Scheme   Relations in Dependency Scheme   Some Hindi Constructions

Objective

  To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence

  We are developing treebanks for HindiUrdu

  Following Paninian framework as the annotation scheme

  We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

14

An Example  Example

$  meraa badZaa bhaaii bahuta phala khaataa hai

my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES

lsquoMY elder brother eats lots of fruitsrsquo

An Example (Contd)

  Morph Analysis

$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

15

An Example (Contd )

  POS Tagging

$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF

phala_NN khaataa_VM hai_VAUX   Chunking

$ ((meraa_PRP))_NP

((baDzaa_JJ bhaaii_NN))_NP

((bahuta_QF phala_NN))_NP

((khaataa_VM hai_VAUX))_VG

An Example (Contd)

  Dependency Relation

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

16

Dependency Scheme

  The Paninian approach treats a sentence as a series of modifier-modified relations   Hence it provides framework for dependency analysis   In our dependency tree

$  each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or

other relations   Chunk represents a set of adjacent words which are in dependency relations with each

other   All the modifier-modified relations between the heads of the chunks (inter-chunk

relations) are marked in this manner

Dependency Scheme (Contd)

  Here modifier-modified relations are marked between the heads of the chunks

$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo

  badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

17

Dependency Scheme (Contd)

khaataa k1 k2

bhaii phala r6

meraa

Relations in Dependency Scheme

  There are 3 types of relations in Dependency Scheme

amp Karaka relations amp Relations other than karakas and

amp Relations which do not fall under dependency relation directly but are required for

showing the dependencies indirectly

  Karaka relations are participants directly involved in the action denoted by the verb

  Relations other than karakas denote purpose reason   Relations which do not fall under dependency relation directly are used for

representing co-ordination and complex predicates

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

18

Basic karaka relations

 Only six

$  karta ndash subjectagentdoer

$  karma ndash objectpatient

$  karana ndash instrument

$  sampradaan ndash beneficiary

$  apaadaan ndash source

$  adhikarana ndash location in placetimeother

Relations other than karakas

  r6 ndash Genitive   rt ndash Purpose   rh ndash Reason  nmod_relc ndash Relative clause   rad ndash Address

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

19

Relations which do not fall under dependency relation

  ccof ndash Conjunction  pof ndash Complex Predicates   fragof ndash Fragment of

Dependency Relation Types

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

20

Some Hindi Constructions

(1)   Causative Constructions   maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo

  Issue

$ Possibility-I Go by syntactic analysis

amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4

Causative Constructions (Contd hellip)

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

21

Causative Constructions (Contd hellip)

 Possibility-II

$  The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo

$  Paninian framework provides the relations

amp  prayojaka karta causerlsquo (pk1) The causer in a causative construction amp  prayojya karta causeelsquo (jk1) The causee in a causative construction amp  madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative

construction

Causative Constructions (Contd hellip)

  Possibility-II

$  Do we mark the above dependency roles $  If we mark these relations then root will be khaa lsquoeatrsquo

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

22

  Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon   Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo   As there is morphological relatedness between the base verb khaa lsquoeatrsquo and

causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively

  For causatives our current decision Follow Possibility-II

Causative Constructions (Contd hellip)

(2) Relative Clauses (nmod__relc)

  Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai

rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo

  Issue

$ Possibility-I

amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause

amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz

khadZaa hairsquo

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

23

Relative Clause Possibility-I

Relative Clauses (nmod__relc)

 Possibility-II

$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause

and vaha lsquohersquo in the main clause is captured by the feature coref

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

24

Relative Clause Alternative-II

Relative Clauses (Contdhellip)

  For relative clauses our current decision Follow Possibility-II   In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the

verb khadzaa hai lsquois standingrsquo of the relclause   The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo

relation   The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by

the feature coref

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

25

(3) anubhava karta ndash k4a

  Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo   Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta   Here dukh lsquounhappyrsquo is the karta   Here mujhko lsquoto mersquo is a subtype of sampradan   This sampradan is different from the sampradan (k4mdashbeneficiary)   We call it as anubhava karta represented by k4a

anubhava karta ndash k4a (Contd )

  Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo   Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon

was visible to mersquo Verb  

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

26

anubhava karta ndash k4a (Contdhellip)

 Ex-2

  Ex-3

anubhava karta ndash k4a (Contdhellip)

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

27

(4) Relation samanadhikaran- rs

  Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $  Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo   In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma   In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object

yaha lsquothisrsquo so it attahes to yaha as rs

Relation samanadhikaran- rs (Contdhellip)

  Ex-1

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

28

Relation samanadhikaran- rs (Contd) ndash Ex-2

(5) Conditionals   Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo

lsquoHad she been not sick she would have definitely come to the partyrsquo

  Issue

$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

29

Possibility - I

agar-to paired-ccof paired-ccof

agar to ccof ccof

naa hotii aatii

Possibility - II

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

30

Conditionals (Contd)   Possibility-I is not possible because agar-to is the head of the tree

which is an abstract node ie it is not a lexical node   For conditionals our current decision Follow Possibility-II   In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo

clause   Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo

clause is the main clause

(6) Participles (vmod)

  In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)

  The arguments occurs only once in the sentence but is semantically related to both the verbs   The shared argument syntactically always attaches with the main verb   For the other verb this argument is semantically realized but not syntactically

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

31

Participles (vmod) (Contd )

  Ex vaha rojZa patra likhakara PaadZataa hai

rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo

lsquoHaving letters written everyday he tearsrsquo

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

32

Participles (vmod) (Contd )

  The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa

lsquotearsrsquo is shared with another participle verb likhakar lsquohaving

writtenrsquo

Participles (vmod) (contd)

Paadzataa hai k1 k7t k2 vmod

vaha rojZa pawra likhakar k1 k2

vaha pawra

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

33

(7)Ellipsis

  How to show dependencies when the head is missing   Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo   In the above example vo lsquothatrsquo is missing which becomes the parent node

for relative clause lsquotum jo bhi kahogersquo   We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency

Ellipsis (Contd)

maan luungii k1 k2

mai NULL__NP (vo) nmod__relc

kahoge k1 k2

tum jo bhi

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

34

Ellipsis (Contd)

  Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate

lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo   No explicit conjunct   Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate

Non-dependency Relations

 ccof ndash Conjunction  pof ndash Complex Predicates   fragof -- Fragment of

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

35

(1) Conjunction (ccof)

  ccof relation doesnrsquot reflects a dependency relation   It is used for coordinating as well as subordinating conjunctions   The dependency trees will show the conjuncts as heads   In coordinating conjuncts the conjunct is the head and takes the coordinating

elements as its children   In subordinating conjunct it would take the clause to which it is syntactically

attached (the subordinate clause) as its child

Conjunction (ccof) (Contdhellip)

  Coordinate Conjunction

$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo

lsquoRam ate food and Sita ate an applersquo

  Subordinate Conjunction

$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

36

Coordinate Conjunction (ccof)

Subordinate Conjunction

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

37

(2) Conjunct Verbs

  Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo   The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa

lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence

  The annotation scheme should be able to account for this relation in the

dependency tree   If prashna kiyaa is grouped as a single verb chunk it will not be possible to

mark the appropriate relation between ek and prashna

Conjunct Verbs (Contd)

  To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG

  The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation)   It means noun or an adjective in the conjunct verb sequence will have a POF relation with

the verb   This way the relation between ek and prashna becomes an intra-chunk relation as they will

now become part of a single NP chunk   Conjunct verbs are chunked separately but semantically they constitute a single unit   It captures the fact that the noun-verb sequence is a conjunct verb by linking them with

POF relation

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

38

Conjunct Verbs (Contd)

kiyaa k1 k2 pof

maine usase prashna

nmod

ek

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

1

Overviewbull  Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull  Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu

(Sharma40minutes)bull  Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues

tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)

bull  Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)

bull  Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)

bull  Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)

bull  Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)

LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis

AshwiniVaidyaUniversityofColoradoBoulder

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

Contents1  Mo3va3on

2  IntroducingPropBank3  Framefiledefini3on4  HindiPropBank5  Linguis3cPhenomena

Whyisseman3cinforma3onimportant

bull  Imagineanautoma3cques3onansweringsystembull  Whocreatedthefirsteffec3vepoliovaccinebull  Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

WordMatches

bull  Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh

Parsing

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

Seman3cRolelabelling

bull  Whocreatedthefirsteffec3vepoliovaccinendash  [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine

ndash  [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh

SRLgivesustherightanswer

bull  Weneedseman3cinforma3ontoprefertherightanswer

bull  Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo

bull  Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo

bull  Wecanfilteroutthewronganswer

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

Weneedseman3cinforma3on

bull  Tofindoutabouteventsandtheirpar3cipantsbull  Tocaptureseman3cinforma3onacrosssyntac3cvaria3on

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

6

Seman3cinforma3on

bull  Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles

bull  AgentExperiencerThemeResultetcbull  Howeverdifficulttohaveastandardsetofthema3croles

Proposi3onBank

bull  Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling

bull  APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on

bull  Asetofseman3crolesisdefinedforeachverbbull  Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

7

PropBankFramefiles

bull  PropBankdefinesseman3crolesonaverbLbyLverbbasis

bull  Thisisdefinedinaverblexiconconsis3ngofframefiles

bull  Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage

bull  Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile

Anexample

bull  Johnringsthebellring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

8

Anexample

bull  Johnringsthebellbull  Tallaspentreesringthelakering01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringfor

ring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Anexample

bull  [John]rings[thebell]bull  [Tallaspentrees]ring[thelake]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

9

Anexample

bull  [JohnARG0]rings[thebellARG1]bull  [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell

Arg0 Causerofringing

Arg1 Thingrung

Arg2 Ringforring02 Tosurround

Arg1 Surroundingen3ty

Arg2 Surroundeden3ty

Ring01

Ring02

HindiPropBank

bull  Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

10

FramefilesforHindi

bull  Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]

EmptyArguments

bull  PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext

ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash  GAPelidedargumentsincoLordinatedclauses

bull  PROandRELPROareinsertedautoma3callybull  GAPandproareinsertedmanually

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

11

PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags

ARGACauser ARGALMNSIndirectcauser

ARG0Agentexperiencer ARG0LMNSInducedcauser

ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole

ARG2Recipient ARG2LATRAaribute

ARG3Instrument ARG2LGOLGoal

ARG2LSOUSource

ARG2LLOCLoca3on

ARG2LDIRDirec3on

PropBankTagsetModifierArguments

ARGMLTMPTemporalARGMLMNRManner

ARGMLLOCLoca3on

ARGMLPRPPurpose

ARGMLCAUCause

ARGMLDISDiscourse

ARGMLADVAdverb

ARGMLMNSMeans

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

12

Linguis3cphenomena

bull  Simpletransi3vebull  Unaccusa3veandUnerga3vebull  Existen3albull  Da3vesubjectbull  Ditransi3vebull  Causa3vesbull  ComplexPredicates

SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook

transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook

transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

13

Unaccusa3veampUnerga3ve

bull  Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)

bull  Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0

bull  Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers

Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut

Thedoorwillopen

unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened

unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

14

Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept

unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep

Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo

bull  Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

15

Da3veSubject

unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst

YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst

YesterdaynightIsawthemoonbehindtheclouds

unacc4

datsubj1

TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

16

Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan

ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan

Causa3ves

bull  Hindihastwowaysofformingthecausa3vebull  Addndashaa

ndash  (sosulaa)sleepmakesomeonesleepbull  Addndashvaa

ndash  (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep

bull  WeintroducethelabelARGAtoanalyzecausersbull  SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees

bull  ARGALMNSforintermediatecausers

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

17

Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep

causa3veL1 आया आतफ़ को स6लाया

aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo

causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa

motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo

Causa3ves

Causa3veL1

Causa3veL2

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

18

Causa3vesclasses

Complexpredicates

bull  Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust

bull  Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

19

ComplexPredicate

complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo

Complexpredicate

complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

20

ComplementClause

complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut

RamknowsthatSitawillarrivelate

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

1

PhraseStructureRepresenta3on

OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu

PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks

bull  DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow

bull  Developedinconjunc3onwithDSandPBbull  InspiredbyChomskyantradi3on

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

2

BackgroundforPS

bull  Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren

ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull  LanguageVuniversalprinciplesbull  LanguageVspecificparameters

bull  PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach

BasicPrinciplesofPS

bull  PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)

bull  Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces

bull  Transforma3onalgrammar

bull  Monostratalrepresenta3onbull  NotunlikeEnglishPennTreebank

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

3

SpecificAssump3onsaboutRepresenta3onMadebyPS

bull  Phrasestructurebull  No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash  Verbswithauxiliariesandcomplemen3zers(ki)

bull  Binarybranchingndash  Theore3calreasonsndash  TobedifferentfromDS

BasicTransi3veClause(1)

bull  Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2

VPVPred

VP

NP

NPA3f

kitab

V

paRhegaaआततफिकताबपढ़गा

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

4

BasicTransi3veClause(2)

bull  Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on

VPVPred

VP

NPVP

NPA3fVne

kitab

V

paRhii

आततफिकताबपढ़ी

Intrasi3veClauseUnerga3ve

bull  PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve

bull  Inunerga3vetheresimplyisnoobject

VPVPred

VP

NP

A3fV

soyegaa

आततफसोएगा

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

5

Intrasi3veClauseUnaccusa3ve

bull  Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)

VPVPred

VP

NP1

NPdarvaazaa

CASE1

V

khulegaa

दरवाज़ाख89गा

Existen3als

bull  Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct

उस कमlt=चA

VPVPred

VP

NP1

NPcuuhe

CASE1

V

hain

VP

NPVP

uskamremein

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

6

Ditransi3ve

bull  TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on

VPVPred

NPVP

NP

RamVne

kitaab

V

dii

VPVPred

NPVP

MohanVko

VP

राममोहन कोिकताबदी

PugngitAllTogetherDa3veSubjects

कलरा बादलE=म8झकोचाGददखा

VPVPred

NP1

NP

caaMd

CASE1

V

dikhaa

VP

VP

NPVP

baadaloMmein

VP

NP

kalraat

VPVPred

NP

SCR2

VPNP2

mujhko

bull  Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone

bull  Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)

bull  Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

7

ComplementClauseswithki

रामजानताAि सीताIरJआतएगी

VPVPred

VP

NP

NPRam

EXTR1

V VPVPred

NP

VP

NP1

Sita

CASE1

V

aayegi

VPVPred

NPVP

CP1

C

ki

dersejaantaa

VVAux

VP

VP

hai

Rela3veClause

VPVPredNPVP

NP

tumne

SCR1

V

dii

VPVPred

NP

PRO

VP

VP

NP1

jokitaab

CP

C

COMPVVAux

thii

VP VPVPred

VP

NPVP

maineV

paRhii

NPSCR3

VP

NP

NP

vah

जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve

101212

8

ComplexPredicate

VPVPred

VP

NP

NPVP1Raam

RaviVko V

kar

VVAux

VP

VP

rahaa

VVAux

thaa

Vrsquo

NP

NP

CASE1

N

yaad

राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi

Causa3ve